Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Selected Works

William Lund

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Ensemble Methods For Historical Machine-Printed Document Recognition, William Lund Sep 2014

Ensemble Methods For Historical Machine-Printed Document Recognition, William Lund

William Lund

The usefulness of digitized documents is directly related to the quality of the extracted text. Optical Character Recognition (OCR) has reached a point where well-formatted and clean machine- printed documents are easily recognizable by current commercial OCR products; however, older or degraded machine-printed documents present problems to OCR engines resulting in word error rates (WER) that severely limit either automated or manual use of the extracted text. Major archives of historical machine-printed documents are being assembled around the globe, requiring an accurate transcription of the text for the automated creation of descriptive metadata, full-text searching, and information extraction. Given document …