Library and Information Science | Open Access Articles

Bridging The Simulation-To-Reality Gap: Adapting Simulation Environment For Object Recognition, Hardik Yogesh Sonetta

Electronic Theses and Dissertations

Rapid advancements in object recognition have created a huge demand for labeled datasets for the task of training, testing, and validation of different techniques. Due to the wide range of applications, object models in the datasets need to cover both variations in geometric features and diverse conditions in which sensory inputs are obtained. Also, the need to manually label the object models is cumbersome. As a result, it becomes difficult for researchers to gain access to adequate datasets for the development of new methods or algorithms. In comparison, computer simulation has been considered a cost-effective solution to generate simulated data …

Go to article

Extractive Research Slide Generation Using Windowed Labeling Ranking, Athar Sefid, Prasenjit Mitra, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

Presentation slides generated from original research papers provide an efficient form to present research innovations. Manually generating presentation slides is labor-intensive. We propose a method to automatically generates slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites. The sentence labeling module of our method is based on SummaRuNNer, a neural sequence model for extractive summarization. Instead of ranking sentences based on semantic similarities in the whole document, our algorithm measures the importance and novelty of sentences by combining semantic and lexical features within a sentence window. Our method outperforms several baseline methods …

Go to article

A Synthetic Document Image Dataset For Developing And Evaluating Historical Document Processing Methods, Daniel Walker, William Lund, Eric Ringger

Faculty Publications

Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the performance of such methods often requires further annotation of training and test data (e.g., topical document labels). However, transcribing and labeling historical documents is expensive. As a result, existing real-world document image datasets with such accompanying resources are rare and often relatively small. We introduce synthetic document image datasets of varying levels of noise that have been created from standard (English) text corpora using an existing document degradation …

Go to article

Library and Information Science Commons^™

Full-Text Articles in Library and Information Science

Bridging The Simulation-To-Reality Gap: Adapting Simulation Environment For Object Recognition, Hardik Yogesh Sonetta

Electronic Theses and Dissertations

Extractive Research Slide Generation Using Windowed Labeling Ranking, Athar Sefid, Prasenjit Mitra, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

A Synthetic Document Image Dataset For Developing And Evaluating Historical Document Processing Methods, Daniel Walker, William Lund, Eric Ringger

Faculty Publications