Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2006

Computer Sciences

Selected Works

R. Manmatha

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

A Hierarchical, Hmmbased Automatic Evaluation Of Ocr Accuracy For A Digital Library Of Books, Shaolei Feng, R. Manmatha Dec 2005

A Hierarchical, Hmmbased Automatic Evaluation Of Ocr Accuracy For A Digital Library Of Books, Shaolei Feng, R. Manmatha

R. Manmatha

A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a …