Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Library and Information Science

Building Datasets To Support Information Extraction And Structure Parsing From Electronic Theses And Dissertations, William A. Ingram, Jian Wu, Sampanna Yashwant Kahu, Javaid Akbar Manzoor, Bipasha Banerjee, Aman Ahuja, Muntabir Hasan Choudhury, Lamia Salsabil, Winston Shields, Edward A. Fox Jan 2024

Building Datasets To Support Information Extraction And Structure Parsing From Electronic Theses And Dissertations, William A. Ingram, Jian Wu, Sampanna Yashwant Kahu, Javaid Akbar Manzoor, Bipasha Banerjee, Aman Ahuja, Muntabir Hasan Choudhury, Lamia Salsabil, Winston Shields, Edward A. Fox

Computer Science Faculty Publications

Despite the millions of electronic theses and dissertations (ETDs) publicly available online, digital library services for ETDs have not evolved past simple search and browse at the metadata level. We need better digital library services that allow users to discover and explore the content buried in these long documents. Recent advances in machine learning have shown promising results for decomposing documents into their constituent parts, but these models and techniques require data for training and evaluation. In this article, we present high-quality datasets to train, evaluate, and compare machine learning methods in tasks that are specifically suited to identify and …


Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu Jan 2022

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu

Computer Science Faculty Publications

Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions …


Bibliometric Analysis Of Named Entity Recognition For Chemoinformatics And Biomedical Information Extraction Of Ovarian Cancer, Vijayshri Khedkar, Charlotte Fernandes, Devshi Desai, Mansi R, Gurunath Chavan Dr, Sonali Tidke Dr., M. Karthikeyan Dr. Apr 2021

Bibliometric Analysis Of Named Entity Recognition For Chemoinformatics And Biomedical Information Extraction Of Ovarian Cancer, Vijayshri Khedkar, Charlotte Fernandes, Devshi Desai, Mansi R, Gurunath Chavan Dr, Sonali Tidke Dr., M. Karthikeyan Dr.

Library Philosophy and Practice (e-journal)

With the massive amount of data that has been generated in the form of unstructured text documents, Biomedical Named Entity Recognition (BioNER) is becoming increasingly important in the field of biomedical research. Since currently there does not exist any automatic archiving of the obtained results, a lot of this information remains hidden in the textual details and is not easily accessible for further analysis. Hence, text mining methods and natural language processing techniques are used for the extraction of information from such publications.Named entity recognition, is a subtask that comes under information extraction that focuses on finding and categorizing specific …


Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao Jan 2018

Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao

Legacy Theses & Dissertations (2009 - 2024)

Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction …


Certainty Identification In Texts: Categorization Model And Manual Tagging Results, Elizabeth Liddy, Victoria Rubin, Noriko Kando Oct 2015

Certainty Identification In Texts: Categorization Model And Manual Tagging Results, Elizabeth Liddy, Victoria Rubin, Noriko Kando

Victoria Rubin

This chapter presents a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. Our contribution is in a proposed categorization model and analytical framework for certainty identification. Certainty is presented as a type of subjective information available in texts. Statements with explicit certainty markers were identified and categorized according to four hypothesized dimensions – level, perspective, focus, and time of certainty. The preliminary results reveal an overall promising picture of the presence of certainty information in texts, and establish its susceptibility to manual identification within the proposed four-dimensional certainty categorization analytical framework. …


Usefulness And Applications Of Data Mining In Extracting Information From Different Perspectives, Jiban K. Pal Mar 2011

Usefulness And Applications Of Data Mining In Extracting Information From Different Perspectives, Jiban K. Pal

Journal Articles

Discusses the concept of data mining, its applications, benefits, and the standard tasks involved in the process. Such pattern-seeking techniques usually performed with a wide range of related areas (viz. statistics, neural networks, genetic algorithms, machine learning, pattern recognition, knowledge-based systems, etc.) are described. Also focuses on bibliomining opportunities to be useful to information retrieval, semantic analysis of unstructured texts, web-usage mining, and making proactive as well as knowledge-driven decisions across library services Suggests the use of data mining in combination with other techniques of evaluation, exploiting large data warehouses by skilled specialists, and advises for ethical uses without privacy …


Certainty Identification In Texts: Categorization Model And Manual Tagging Results, Elizabeth D. Liddy, Victoria L. Rubin, Noriko Kando Jan 2006

Certainty Identification In Texts: Categorization Model And Manual Tagging Results, Elizabeth D. Liddy, Victoria L. Rubin, Noriko Kando

School of Information Studies - Faculty Scholarship

This chapter presents a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. Our contribution is in a proposed categorization model and analytical framework for certainty identification. Certainty is presented as a type of subjective information available in texts. Statements with explicit certainty markers were identified and categorized according to four hypothesized dimensions – level, perspective, focus, and time of certainty.

The preliminary results reveal an overall promising picture of the presence of certainty information in texts, and establish its susceptibility to manual identification within the proposed four-dimensional certainty categorization analytical framework. …


A Breadth Of Nlp Applications, Elizabeth D. Liddy Jan 2002

A Breadth Of Nlp Applications, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

The Center for Natural Language Processing (CNLP) was founded in September 1999 in the School of Information Studies, the “Original Information School”, at Syracuse University. CNLP’s mission is to advance the development of human-like, language understanding software capabilities for government, commercial, and consumer applications. The Center conducts both basic and applied research, building on its recognized capabilities in Natural Language Processing. The Center’s seventeen employees are a mix of doctoral students in information science or computer engineering, software engineers, linguistic analysts, and research engineers.