Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Incremental Non-Greedy Clustering At Scale, Nicholas Monath Mar 2022

Incremental Non-Greedy Clustering At Scale, Nicholas Monath

Doctoral Dissertations

Clustering is the task of organizing data into meaningful groups. Modern clustering applications such as entity resolution put several demands on clustering algorithms: (1) scalability to massive numbers of points as well as clusters, (2) incremental additions of data, (3) support for any user-specified similarity functions. Hierarchical clusterings are often desired as they represent multiple alternative flat clusterings (e.g., at different granularity levels). These tree-structured clusterings provide for both fine-grained clusters as well as uncertainty in the presence of newly arriving data. Previous work on hierarchical clustering does not fully address all three of the aforementioned desiderata. Work on incremental …


Robust Algorithms For Clustering With Applications To Data Integration, Sainyam Galhotra Oct 2021

Robust Algorithms For Clustering With Applications To Data Integration, Sainyam Galhotra

Doctoral Dissertations

A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below. 1. While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that …


Compact Representations Of Uncertainty In Clustering, Craig Stuart Greenberg Apr 2021

Compact Representations Of Uncertainty In Clustering, Craig Stuart Greenberg

Doctoral Dissertations

Flat clustering and hierarchical clustering are two fundamental tasks, often used to discover meaningful structures in data, such as subtypes of cancer, phylogenetic relationships, taxonomies of concepts, and cascades of particle decays in particle physics. When multiple clusterings of the data are possible, it is useful to represent uncertainty in clustering through various probabilistic quantities, such as the distribution over partitions or tree structures, and the marginal probabilities of subpartitions or subtrees. Many compact representations exist for structured prediction problems, enabling the efficient computation of probability distributions, e.g., a trellis structure and corresponding Forward-Backward algorithm for Markov models that model …


Reasoning About User Feedback Under Identity Uncertainty In Knowledge Base Construction, Ariel Kobren Dec 2020

Reasoning About User Feedback Under Identity Uncertainty In Knowledge Base Construction, Ariel Kobren

Doctoral Dissertations

Intelligent, automated systems that are intertwined with everyday life---such as Google Search and virtual assistants like Amazon’s Alexa or Apple’s Siri---are often powered in part by knowledge bases (KBs), i.e., structured data repositories of entities, their attributes, and the relationships among them. Despite a wealth of research focused on automated KB construction methods, KBs are inevitably imperfect, with errors stemming from various points in the construction pipeline. Making matters more challenging, new data is created daily and must be integrated with existing KBs so that they remain up-to-date. As the primary consumers of KBs, human users have tremendous potential to …


A Proportionality-Based Approach To Search Result Diversification, Van Bac Dang Aug 2014

A Proportionality-Based Approach To Search Result Diversification, Van Bac Dang

Doctoral Dissertations

Search result diversification addresses the problem of queries with unclear information needs. The aim of using diversification techniques is to find a ranking of documents that covers multiple possible interpretations, aspects, or topics for a given query. By explicitly providing diversity in search results, this approach can increase the likelihood that users will find documents relevant to their specific intent, thereby improving effectiveness. This dissertation introduces a new perspective on diversity: diversity by proportionality. We consider a result list more diverse, with respect to some set of topics related to the query, when the ratio between the number of relevant …


Clustering, Reorientation Dynamics, And Proton Transfer In Glassy Oligomeric Solids, Jacob Allen Harvey Sep 2013

Clustering, Reorientation Dynamics, And Proton Transfer In Glassy Oligomeric Solids, Jacob Allen Harvey

Open Access Dissertations

We have modelled structures and dynamics of hydrogen bond networks that form from imidazoles tethered to oligomeric aliphatic backbones in crystalline and glassy phases. We have studied the behavior of oligomers containing 5 or 10 imidazole groups. These systems have been simulated over the range 100-900 K with constantpressure molecular dynamics using the AMBER 94 force field, which was found to show good agreement with ab initio calculations on hydrogen bond strengths and imidazole rotational barriers. Hypothetical crystalline solids formed from packed 5-mers and 10-mers melt above 600 K, then form glassy solids upon cooling. Viewing hydrogen bond networks as …