Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

University of Massachusetts Amherst

Theses/Dissertations

2022

Clustering

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Incremental Non-Greedy Clustering At Scale, Nicholas Monath Mar 2022

Incremental Non-Greedy Clustering At Scale, Nicholas Monath

Doctoral Dissertations

Clustering is the task of organizing data into meaningful groups. Modern clustering applications such as entity resolution put several demands on clustering algorithms: (1) scalability to massive numbers of points as well as clusters, (2) incremental additions of data, (3) support for any user-specified similarity functions. Hierarchical clusterings are often desired as they represent multiple alternative flat clusterings (e.g., at different granularity levels). These tree-structured clusterings provide for both fine-grained clusters as well as uncertainty in the presence of newly arriving data. Previous work on hierarchical clustering does not fully address all three of the aforementioned desiderata. Work on incremental …