Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

University of Massachusetts Amherst

Uncertainty

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Compact Representations Of Uncertainty In Clustering, Craig Stuart Greenberg Apr 2021

Compact Representations Of Uncertainty In Clustering, Craig Stuart Greenberg

Doctoral Dissertations

Flat clustering and hierarchical clustering are two fundamental tasks, often used to discover meaningful structures in data, such as subtypes of cancer, phylogenetic relationships, taxonomies of concepts, and cascades of particle decays in particle physics. When multiple clusterings of the data are possible, it is useful to represent uncertainty in clustering through various probabilistic quantities, such as the distribution over partitions or tree structures, and the marginal probabilities of subpartitions or subtrees. Many compact representations exist for structured prediction problems, enabling the efficient computation of probability distributions, e.g., a trellis structure and corresponding Forward-Backward algorithm for Markov models that model …


Collective Multi-Label Classification, Nadia Ghamrawi, Andrew Mccallum Jan 2005

Collective Multi-Label Classification, Nadia Ghamrawi, Andrew Mccallum

Computer Science Department Faculty Publication Series

Common approaches to multi-label classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only well-suited to problems in which categories are independent. However, in many domains labels are highly interdependent. This paper explores multilabel conditional random field (CRF) classification models that directly parameterize label co-occurrences in multi-label classification. Experiments show that the models outperform their singlelabel counterparts on standard text corpora. Even when multilabels are sparse, the models improve subset classification error by as much as 40%.