Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Selected Works

2007

Classification

Articles 1 - 1 of 1

Full-Text Articles in Computer Sciences

Organizing The Oca: Learning Faceted Subjects From A Library Of Digital Books, David Mimno, Andrew Mccallum Jan 2007

Organizing The Oca: Learning Faceted Subjects From A Library Of Digital Books, David Mimno, Andrew Mccallum

Andrew McCallum

Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent ``topics'' that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model …