Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

University of Massachusetts Amherst

2006

Digital Libraries

Articles 1 - 2 of 2

Full-Text Articles in Entire DC Network

A Hierarchical, Hmmbased Accuracy For A Digital Library Of Books, Shaolei Feng Jun 2006

A Hierarchical, Hmmbased Accuracy For A Digital Library Of Books, Shaolei Feng

Computer Science Department Faculty Publication Series

A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a …


Bibliometric Impact Measures Leveraging Topic Analysis, Gideon S. Mann, David Mimno, Andrew Mccallum Jan 2006

Bibliometric Impact Measures Leveraging Topic Analysis, Gideon S. Mann, David Mimno, Andrew Mccallum

Andrew McCallum

Measurements of the impact and history of research literature provide a useful complement to scientific digital library collections. Bibliometric indicators have been extensively studied, mostly in the context of journals. However, journal-based metrics poorly capture topical distinctions in fast-moving fields, and are increasingly problematic in the context of open-access publishing. Recent developments in latent topic models have produced promising results for automatic sub-field discovery. The fine-grained, faceted topics produced by such models provide a more clear view of the topical divisions of a body of research literature and the interactions between those divisions. We demonstrate the usefulness of topic models …