Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses and Dissertations

Theses/Dissertations

2016

Clustering

Articles 1 - 3 of 3

Full-Text Articles in Physical Sciences and Mathematics

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer May 2016

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer

Theses and Dissertations

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform …


Increment - Interactive Cluster Refinement, Logan Adam Mitchell Mar 2016

Increment - Interactive Cluster Refinement, Logan Adam Mitchell

Theses and Dissertations

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the …


Registration And Clustering Of Functional Observations, Zizhen Wu Jan 2016

Registration And Clustering Of Functional Observations, Zizhen Wu

Theses and Dissertations

As an important exploratory analysis, curves of similar shape are often classified into groups, which we call clustering of functional data. Phase variations or time distortions are often encountered in the biological processes, such as growth patterns or gene profiles. As a result of time distortion, curves of similar shape may not be aligned. Regular clustering methods for functional data usually ignore the presence of phase variations, which may result in low clustering accuracy. However, it is difficult to account for phase variation without knowing the cluster structure.

In this dissertation, we first propose a Bayesian method that simultaneously clusters …