Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics

Old Dominion University

Computer Science Faculty Publications

Series

2022

NER

Articles 1 - 2 of 2

Full-Text Articles in Entire DC Network

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu Jan 2022

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu

Computer Science Faculty Publications

Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions …


Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen Jan 2022

Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen

Computer Science Faculty Publications

Technical drawings used for illustrating designs are ubiquitous in patent documents, especially design patents. Different from natural images, these drawings are usually made using black strokes with little color information, making it challenging for models trained on natural images to recognize objects. To facilitate indexing and searching, we propose an effective and efficient visual descriptor model that extracts object names and aspects from patent captions to annotate benchmark patent figure datasets. We compared two state-of-the-art named entity recognition (NER) models and found that with a limited number of annotated samples, the BiLSTM-CRF model outperforms the Transformer model by a significant …