Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science

Brigham Young University

Faculty Publications

Series

2014

Ensemble Methods

Articles 1 - 1 of 1

Full-Text Articles in Social and Behavioral Sciences

How Well Does Multiple Ocr Error Correction Generalize?, William B. Lund, Eric K. Ringger, Daniel D. Walker Jan 2014

How Well Does Multiple Ocr Error Correction Generalize?, William B. Lund, Eric K. Ringger, Daniel D. Walker

Faculty Publications

As the digitization of historical documents, such as newspapers, becomes more common, the need of the archive patron for accurate digital text from those documents increases. Building on our earlier work, the contributions of this paper are: 1. in demonstrating the applicability of novel methods for correcting optical character recognition (OCR) on disparate data sets, including a new synthetic training set, 2. enhancing the correction algorithm with novel features, and 3. assessing the data requirements of the correction learning method. First, we correct errors using conditional random fields (CRF) trained on synthetic training data sets in order to demonstrate the …