Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

2013

PDF

Brigham Young University

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel Aug 2013

Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel

Theses and Dissertations

Many projects exist whose purpose is to augment raw data with annotations that increase the usefulness of the data. The number of these projects is rapidly growing and in the age of “big data” the amount of data to be annotated is likewise growing within each project. One common use of such data is in supervised machine learning, which requires labeled data to train a predictive model. Annotation is often a very expensive proposition, particularly for structured data. The purpose of this dissertation is to explore methods of reducing the cost of creating such data sets, including annotated text corpora.We …


Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen Apr 2013

Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen

Theses and Dissertations

Latent Dirichlet Allocation (LDA) is widely used for automatic discovery of latent topics in document corpora. However, output from analysis using an LDA topic model suffers from a lack of identifiability between topics not only across corpora, but across runs of the algorithm. The output is also isolated from enriching information from knowledge sources such as Wikipedia and is difficult for humans to interpret due to a lack of meaningful topic labels. This thesis introduces two methods for probabilistic explicit topic modeling that address these issues: Latent Dirichlet Allocation with Static Topic-Word Distributions (LDA-STWD), and Explicit Dirichlet Allocation (EDA). LDA-STWD …