Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Brigham Young University

2016

Clustering

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer May 2016

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer

Theses and Dissertations

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform …


Increment - Interactive Cluster Refinement, Logan Adam Mitchell Mar 2016

Increment - Interactive Cluster Refinement, Logan Adam Mitchell

Theses and Dissertations

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the …