Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Brigham Young University

Theses and Dissertations

2009

Document clustering

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Bisecting Document Clustering Using Model-Based Methods, Aaron Samuel Davis Dec 2009

Bisecting Document Clustering Using Model-Based Methods, Aaron Samuel Davis

Theses and Dissertations

We all have access to large collections of digital text documents, which are useful only if we can make sense of them all and distill important information from them. Good document clustering algorithms that organize such information automatically in meaningful ways can make a difference in how effective we are at using that information. In this paper we use model-based document clustering algorithms as a base for bisecting methods in order to identify increasingly cohesive clusters from larger, more diverse clusters. We specifically use the EM algorithm and Gibbs Sampling on a mixture of multinomials as the base clustering algorithms …