Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Intelligent Indexing: A Semi-Automated, Trainable System For Field Labeling, Robert T. Clawson Sep 2014

Intelligent Indexing: A Semi-Automated, Trainable System For Field Labeling, Robert T. Clawson

Theses and Dissertations

We present Intelligent Indexing: a general, scalable, collaborative approach to indexing and transcription of non-machine-readable documents that exploits visual consensus and group labeling while harnessing human recognition and domain expertise. In our system, indexers work directly on the page, and with minimal context switching can navigate the page, enter labels, and interact with the recognition engine. Interaction with the recognition engine occurs through preview windows that allow the indexer to quickly verify and correct recommendations. This interaction is far superior to conventional, tedious, inefficient post-correction and editing. Intelligent Indexing is a trainable system that improves over time and can provide …


Bioinformatic Solutions To Complex Problems In Mass Spectrometry Based Analysis Of Biomolecules, Ryan M. Taylor Jul 2014

Bioinformatic Solutions To Complex Problems In Mass Spectrometry Based Analysis Of Biomolecules, Ryan M. Taylor

Theses and Dissertations

Biological research has benefitted greatly from the advent of omic methods. For many biomolecules, mass spectrometry (MS) methods are most widely employed due to the sensitivity which allows low quantities of sample and the speed which allows analysis of complex samples. Improvements in instrument and sample preparation techniques create opportunities for large scale experimentation. The complexity and volume of data produced by modern MS-omic instrumentation challenges biological interpretation, while the complexity of the instrumentation, sample noise, and complexity of data analysis present difficulties in maintaining and ensuring data quality, validity, and relevance. We present a corpus of tools which improves …


Musical Motif Discovery In Non-Musical Media, Daniel S. Johnson Jun 2014

Musical Motif Discovery In Non-Musical Media, Daniel S. Johnson

Theses and Dissertations

Many music composition algorithms attempt to compose music in a particular style. The resulting music is often impressive and indistinguishable from the style of the training data, but it tends to lack significant innovation. In an effort to increase innovation in the selection of pitches and rhythms, we present a system that discovers musical motifs by coupling machine learning techniques with an inspirational component. The inspirational component allows for the discovery of musical motifs that are unlikely to be produced by a generative model, while the machine learning component harnesses innovation. Candidate motifs are extracted from non-musical media such as …


Ensemble Methods For Historical Machine-Printed Document Recognition, William B. Lund Apr 2014

Ensemble Methods For Historical Machine-Printed Document Recognition, William B. Lund

Theses and Dissertations

The usefulness of digitized documents is directly related to the quality of the extracted text. Optical Character Recognition (OCR) has reached a point where well-formatted and clean machine- printed documents are easily recognizable by current commercial OCR products; however, older or degraded machine-printed documents present problems to OCR engines resulting in word error rates (WER) that severely limit either automated or manual use of the extracted text. Major archives of historical machine-printed documents are being assembled around the globe, requiring an accurate transcription of the text for the automated creation of descriptive metadata, full-text searching, and information extraction. Given document …