Open Access. Powered by Scholars. Published by Universities.®
Social and Behavioral Sciences Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- API (1)
- AckExtract (1)
- Acknowledgement extraction framework (1)
- Acknowledgements (1)
- Adding value (1)
-
- Batch processing (1)
- Book length documents (1)
- CORD-19 dataset (1)
- COVID-19 (1)
- Citation context (1)
- Computational access (1)
- Digital libraries (1)
- Document analysis (1)
- Electronic theses and dissertations (1)
- Heuristic method (1)
- JSON (1)
- Java (1)
- Metadata (1)
- Metadata extraction (1)
- NIH (1)
- NSFC (1)
- OCR (1)
- Optical character recognition (1)
- PDF (1)
- Research articles (1)
- SARS outbreak (1)
- Scholarly papers (1)
- Support vector machine model (1)
- Text mining (1)
- User services (1)
Articles 1 - 4 of 4
Full-Text Articles in Social and Behavioral Sciences
A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox
A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox
Computer Science Faculty Publications
Extracting metadata from scholarly papers is an important text mining problem. Widely used open-source tools such as GROBID are designed for born-digital scholarly papers but often fail for scanned documents, such as Electronic Theses and Dissertations (ETDs). Here we present a preliminary baseline work with a heuristic model to extract metadata from the cover pages of scanned ETDs. The process started with converting scanned pages into images and then text files by applying OCR tools. Then a series of carefully designed regular expressions for each field is applied, capturing patterns for seven metadata fields: titles, authors, years, degrees, academic programs, …
Acknowledgement Entity Recognition In Cord-19 Papers, Jian Wu, Pei Wang, Xin Wei, Sarah Rajtmajer, C. Lee Giles, Christopher Griffin
Acknowledgement Entity Recognition In Cord-19 Papers, Jian Wu, Pei Wang, Xin Wei, Sarah Rajtmajer, C. Lee Giles, Christopher Griffin
Computer Science Faculty Publications
Acknowledgements are ubiquitous in scholarly papers. Existing acknowledgement entity recognition methods assume all named entities are acknowledged. Here, we examine the nuances between acknowledged and named entities by analyzing sentence structure. We develop an acknowledgement extraction system, AckExtract based on open-source text mining software and evaluate our method using manually labeled data. AckExtract uses the PDF of a scholarly paper as input and outputs acknowledgement entities. Results show an overall performance of F1=0.92. We built a supplementary database by linking CORD-19 papers with acknowledgement entities extracted by AckExtract including persons and organizations and find that only up to …
Smartcitecon: Implicit Citation Context Extraction From Academic Literature Using Unsupervised Learning, Chenrui Gao, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu
Smartcitecon: Implicit Citation Context Extraction From Academic Literature Using Unsupervised Learning, Chenrui Gao, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu
Computer Science Faculty Publications
We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers in the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a …
Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu
Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu
Computer Science Faculty Publications
Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. …