Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Cataloging and Metadata

Smartcitecon: Implicit Citation Context Extraction From Academic Literature Using Unsupervised Learning, Chenrui Gao, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu Jan 2020

Smartcitecon: Implicit Citation Context Extraction From Academic Literature Using Unsupervised Learning, Chenrui Gao, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu

Computer Science Faculty Publications

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers in the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a …


Business In The Front, Party In The Back: Revising Metadata Processes Up-Front To Benefit Back-End Workflows, Scott Bacon May 2017

Business In The Front, Party In The Back: Revising Metadata Processes Up-Front To Benefit Back-End Workflows, Scott Bacon

Library Faculty Presentations

When faced with the prospect of manually uploading thousands of collection objects into our digital repository, I knew I needed to create a workflow to automate batch uploading processes. This resulted in a workflow that allows me to take a metadata spreadsheet containing thousands of rows and transform it into a series of MODS XML files contained in one master file, using OpenRefine's templating tool. The csplit command can be used to split the master file up into thousands of fully-formed MODS XML files. Using a Perl script, the files can be batch renamed to match their corresponding digital object …