Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2014

Faculty Publications

Social and Behavioral Sciences

Optical Character Recognition

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

How Well Does Multiple Ocr Error Correction Generalize?, William B. Lund, Eric K. Ringger, Daniel D. Walker Jan 2014

How Well Does Multiple Ocr Error Correction Generalize?, William B. Lund, Eric K. Ringger, Daniel D. Walker

Faculty Publications

As the digitization of historical documents, such as newspapers, becomes more common, the need of the archive patron for accurate digital text from those documents increases. Building on our earlier work, the contributions of this paper are: 1. in demonstrating the applicability of novel methods for correcting optical character recognition (OCR) on disparate data sets, including a new synthetic training set, 2. enhancing the correction algorithm with novel features, and 3. assessing the data requirements of the correction learning method. First, we correct errors using conditional random fields (CRF) trained on synthetic training data sets in order to demonstrate the …