Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics

University of Massachusetts Amherst

2007

Digital Libraries

Articles 1 - 2 of 2

Full-Text Articles in Entire DC Network

Mining A Digital Library For Influential Authors, David Mimno, Andrew Mccallum Jan 2007

Mining A Digital Library For Influential Authors, David Mimno, Andrew Mccallum

Andrew McCallum

When browsing a digital library of research papers, it is natural to ask which authors are most influential in a particular topic. We present a probabilistic model that ranks authors based on their influence in particular areas of scientific research. This model combines several sources of information: citation information between documents as represented by PageRank scores, authorship data gathered through automatic information extraction, and the words in paper abstracts. We propose a topic model on the words, and compare performance versus a smoothed language model by assessing the number of major award winners in the resulting ranked list of researchers.


Organizing The Oca: Learning Faceted Subjects From A Library Of Digital Books, David Mimno, Andrew Mccallum Jan 2007

Organizing The Oca: Learning Faceted Subjects From A Library Of Digital Books, David Mimno, Andrew Mccallum

Andrew McCallum

Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent ``topics'' that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model …