Open Access. Powered by Scholars. Published by Universities.®
Databases and Information Systems Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
Articles 1 - 2 of 2
Full-Text Articles in Databases and Information Systems
An Annotated Corpus With Nanomedicine And Pharmacokinetic Parameters, Nastassja Lewinski, Ivan Jimenez, Bridget Mcinnes
An Annotated Corpus With Nanomedicine And Pharmacokinetic Parameters, Nastassja Lewinski, Ivan Jimenez, Bridget Mcinnes
Chemical and Life Science Engineering Publications
A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the …
Parsing Metamap Files In Hadoop, Amy Olex, Alberto Cano, Bridget T. Mcinnes
Parsing Metamap Files In Hadoop, Amy Olex, Alberto Cano, Bridget T. Mcinnes
Computer Science Publications
The UMLS::Association CUICollector module identifies UMLS Concept Unique Identifier bigrams and their frequencies in a biomedical text corpus. CUICollector was re-implemented in Hadoop MapReduce to improve algorithm speed, flexibility, and scalability. Evaluation of the Hadoop implementation compared to the serial module produced equivalent results and achieved a 28x speedup on a single-node Hadoop system.