Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Cataloging and Metadata

2010

Document imaging systems

Articles 1 - 1 of 1

Full-Text Articles in Library and Information Science

Xpath-Based Template Language For Describing The Placement Of Metadata Within A Document, Vijay Kumar Musham Dec 2010

Xpath-Based Template Language For Describing The Placement Of Metadata Within A Document, Vijay Kumar Musham

Computer Science Theses & Dissertations

In the recent years, there has been a tremendous growth in Internet and online resources that had previously been restricted to paper archives. OCR (Optical Character Recognition) tools can be used for digitalizing an existing corpus and making it available online. A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, and is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. The Extract project …