Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Faculty Publications

2007

Articles 1 - 1 of 1

Full-Text Articles in Computational Linguistics

Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi Jan 2007

Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi

Faculty Publications

In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and …