Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Andrew McCallum

Selected Works

Topic models

Articles 1 - 2 of 2

Full-Text Articles in Entire DC Network

Expertise Modeling For Matching Papers With Reviewers, David Mimno, Andrew Mccallum Jan 2007

Expertise Modeling For Matching Papers With Reviewers, David Mimno, Andrew Mccallum

Andrew McCallum

An essential part of an expert-finding task, such as matching reviewers to submitted papers, is the ability to model the expertise of a person based on documents. We evaluate several measures of the association between an author in an existing collection of research papers and a previously unseen document. We compare two language model based approaches with a novel topic model, Author-Persona-Topic (APT). In this model, each author can write under one or more ``personas,'' which are represented as independent distributions over hidden topics. Examples of previous papers written by prospective reviewers are gathered from the Rexa database, which extracts …


Organizing The Oca: Learning Faceted Subjects From A Library Of Digital Books, David Mimno, Andrew Mccallum Jan 2007

Organizing The Oca: Learning Faceted Subjects From A Library Of Digital Books, David Mimno, Andrew Mccallum

Andrew McCallum

Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent ``topics'' that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model …