Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Brigham Young University

Theses/Dissertations

Clustering

Articles 1 - 10 of 10

Full-Text Articles in Physical Sciences and Mathematics

Efficient And Adaptive Decentralized Sparse Gaussian Process Regression For Environmental Sampling Using Autonomous Vehicles, Tanner A. Norton Jun 2022

Efficient And Adaptive Decentralized Sparse Gaussian Process Regression For Environmental Sampling Using Autonomous Vehicles, Tanner A. Norton

Theses and Dissertations

In this thesis, I present a decentralized sparse Gaussian process regression (DSGPR) model with event-triggered, adaptive inducing points. This DSGPR model brings the advantages of sparse Gaussian process regression to a decentralized implementation. Being decentralized and sparse provides advantages that are ideal for multi-agent systems (MASs) performing environmental modeling. In this case, MASs need to model large amounts of information while having potential intermittent communication connections. Additionally, the model needs to correctly perform uncertainty propagation between autonomous agents and ensure high accuracy on the prediction. For the model to meet these requirements, a bounded and efficient real-time sparse Gaussian process …


Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer May 2016

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer

Theses and Dissertations

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform …


Increment - Interactive Cluster Refinement, Logan Adam Mitchell Mar 2016

Increment - Interactive Cluster Refinement, Logan Adam Mitchell

Theses and Dissertations

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the …


Cvic: Cluster Validation Using Instance-Based Confidences, Dean M. Lebaron Nov 2015

Cvic: Cluster Validation Using Instance-Based Confidences, Dean M. Lebaron

Theses and Dissertations

As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given clustering fits the data set. We present CVIC (cluster validation using instance-based confidences) which assigns confidence scores to each individual instance, as opposed to more traditional methods which focus on the clusters themselves. CVIC trains supervised learners to recreate the clustering, and instances are scored based on output from the learners which corresponds to …


Bayesian Test Analytics For Document Collections, Daniel David Walker Nov 2012

Bayesian Test Analytics For Document Collections, Daniel David Walker

Theses and Dissertations

Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved …


Easy To Find: Creating Query-Based Multi-Document Summaries To Enhance Web Search, Rani Majed Qumsiyeh Mar 2011

Easy To Find: Creating Query-Based Multi-Document Summaries To Enhance Web Search, Rani Majed Qumsiyeh

Theses and Dissertations

Current web search engines, such as Google, Yahoo!, and Bing, rank the set of documents S retrieved in response to a user query Q and display each document with a title and a snippet, which serves as an abstract of the corresponding document in S. Snippets, however, are not as useful as they are designed for, i.e., to assist search engine users to quickly identify results of interest, if they exist, without browsing through the documents in S, since they (i) often include very similar information and (ii) do not capture the main content of the corresponding documents. …


Relationships Among Learning Algorithms And Tasks, Jun Won Lee Jan 2011

Relationships Among Learning Algorithms And Tasks, Jun Won Lee

Theses and Dissertations

Metalearning aims to obtain knowledge of the relationship between the mechanism of learning and the concrete contexts in which that mechanisms is applicable. As new mechanisms of learning are continually added to the pool of learning algorithms, the chances of encountering behavior similarity among algorithms are increased. Understanding the relationships among algorithms and the interactions between algorithms and tasks help to narrow down the space of algorithms to search for a given learning task. In addition, this process helps to disclose factors contributing to the similar behavior of different algorithms. We first study general characteristics of learning tasks and their …


Increasing Dogma Scaling Through Clustering, Nathan Hyrum Ekstrom Apr 2008

Increasing Dogma Scaling Through Clustering, Nathan Hyrum Ekstrom

Theses and Dissertations

DOGMA is a distributed computing architecture developed at Brigham Young University. It makes use of idle computers to provide additional computing resources to applications, similar to Seti@home. DOGMA's ability to scale to large numbers of computers is hindered by its strict client-server architecture. Recent research with DOGMA has shown that introducing localized peer-to-peer downloading abilities enhances DOGMA's performance while reducing the amount of network and server usage. This thesis proposes to further extend the peer-to-peer abilities of DOGMA to include peering client server communication by creating dynamic clusters of clients. The client clusters aggregate their communication with only one client …


Clustering Methods For Delineating Regions Of Spatial Stationarity, Jared M. Collings Nov 2007

Clustering Methods For Delineating Regions Of Spatial Stationarity, Jared M. Collings

Theses and Dissertations

This paper seeks to further investigate data extracted by the use of Functional Magnetic Resonance Imaging (FMRI) as it is applied to brain tissue and how it measures blood flow to certain areas of the brain following the application of a stimulus. As a precursor to detailed spatial analysis of this kind of data, this paper develops methods of grouping data based on the necessary conditions for spatial statistical analysis. The purpose of this paper is to examine and develop methods that can be used to delineate regions of stationarity. One of the major assumptions used in spatial estimation is …


Clustering Of Database Query Results, Kristine Jean Daniels Apr 2006

Clustering Of Database Query Results, Kristine Jean Daniels

Theses and Dissertations

Increasingly more users are accessing database systems for interactive and exploratory data retrieval. While performing searches on these systems, users are required to use broad queries to get their desired results. Broad queries often result in too many items forcing the user to spend unnecessary time sifting through these items to find the relevant results. This problem, of finding a desired data item within many items, is referred to as "information overload". Most users experience information overload when viewing these database query results. This thesis shows that users information overload can be reduced by clustering database query results. …