Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis Dec 2016

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis

Open Access Dissertations

Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to newly …


Low Rank Methods For Optimizing Clustering, Yangyang Hou Dec 2016

Low Rank Methods For Optimizing Clustering, Yangyang Hou

Open Access Dissertations

Complex optimization models and problems in machine learning often have the majority of information in a low rank subspace. By careful exploitation of these low rank structures in clustering problems, we find new optimization approaches that reduce the memory and computational cost.

We discuss two cases where this arises. First, we consider the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address overlapping and outliers in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. We utilize low …


Differentially Private Data Publishing For Data Analysis, Dong Su Dec 2016

Differentially Private Data Publishing For Data Analysis, Dong Su

Open Access Dissertations

In the information age, vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to design mechanisms for effectively extracting knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of differentially private data publishing. Firstly, we propose PrivPfC, a differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done using the exponential …


A Computational Framework For Learning From Complex Data: Formulations, Algorithms, And Applications, Wenlu Zhang Jul 2016

A Computational Framework For Learning From Complex Data: Formulations, Algorithms, And Applications, Wenlu Zhang

Computer Science Theses & Dissertations

Many real-world processes are dynamically changing over time. As a consequence, the observed complex data generated by these processes also evolve smoothly. For example, in computational biology, the expression data matrices are evolving, since gene expression controls are deployed sequentially during development in many biological processes. Investigations into the spatial and temporal gene expression dynamics are essential for understanding the regulatory biology governing development. In this dissertation, I mainly focus on two types of complex data: genome-wide spatial gene expression patterns in the model organism fruit fly and Allen Brain Atlas mouse brain data. I provide a framework to explore …


Statistical Modeling Of Carbon Dioxide And Cluster Analysis Of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, And Multi-Level Time Series Clustering, Doo Young Kim Jun 2016

Statistical Modeling Of Carbon Dioxide And Cluster Analysis Of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, And Multi-Level Time Series Clustering, Doo Young Kim

USF Tampa Graduate Theses and Dissertations

The current study consists of three major parts. Statistical modeling, the connection between statistical modeling and cluster analysis, and proposing new methods to cluster time dependent information.

First, we perform a statistical modeling of the Carbon Dioxide (CO2) emission in South Korea in order to identify the attributable variables including interaction effects. One of the hot issues in the earth in 21st century is Global warming which is caused by the marriage between atmospheric temperature and CO2 in the atmosphere. When we confront this global problem, we first need to verify what causes the problem then we …


Efficient Algorithms For Clustering Polygonal Obstacles, Sabbir Kumar Manandhar May 2016

Efficient Algorithms For Clustering Polygonal Obstacles, Sabbir Kumar Manandhar

UNLV Theses, Dissertations, Professional Papers, and Capstones

Clustering a set of points in Euclidean space is a well-known problem having applications in pattern recognition, document image analysis, big-data analytics, and robotics. While there are a lot of research publications for clustering point objects, only a few articles have been reported for clustering a given distribution of obstacles. In this thesis we examine the development of efficient algorithms for clustering a given set of convex obstacles in the 2D plane. One of the methods presented in this work uses a Voronoi diagram to extract obstacle clusters. We also consider the implementation issues of point/obstacle clustering algorithms.


Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer May 2016

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer

Theses and Dissertations

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform …


Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya Apr 2016

Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya

Open Access Theses

As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic …


Increment - Interactive Cluster Refinement, Logan Adam Mitchell Mar 2016

Increment - Interactive Cluster Refinement, Logan Adam Mitchell

Theses and Dissertations

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the …


Semantics And Result Disambiguation For Keyword Search On Tree Data, Cem Aksoy Jan 2016

Semantics And Result Disambiguation For Keyword Search On Tree Data, Cem Aksoy

Dissertations

Keyword search is a popular technique for searching tree-structured data (e.g., XML, JSON) on the web because it frees the user from learning a complex query language and the structure of the data sources. However, the convenience of keyword search comes with drawbacks. The imprecision of the keyword queries usually results in a very large number of results of which only very few are relevant to the query. Multiple previous approaches have tried to address this problem. Some of them exploit structural and semantic properties of the tree data in order to filter out irrelevant results while others use a …


A Hybrid Approach To Semantic Hashtag Clustering In Social Media, Ali Javed Jan 2016

A Hybrid Approach To Semantic Hashtag Clustering In Social Media, Ali Javed

Graduate College Dissertations and Theses

The uncontrolled usage of hashtags in social media makes them vary a lot in the quality of semantics and the frequency of usage. Such variations pose a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag by using metadata or the contextual semantics of a hashtag by using the texts associated with a hashtag. This thesis presents a hybrid approach to clustering hashtags based on their semantics, designed in two phases. The first phase is a sense-level metadata-based semantic clustering algorithm that has the ability to differentiate among distinct senses of a hashtag as …


Forecasting Customer Electricity Load Demand In The Power Trading Agent Competition Using Machine Learning, Saiful Abu Jan 2016

Forecasting Customer Electricity Load Demand In The Power Trading Agent Competition Using Machine Learning, Saiful Abu

Open Access Theses & Dissertations

Accurate electricity load demand forecasting is an important problem in managing the power grid for both economic and environmental reasons. The Power TAC simulation provides a platform to do research on smart grid energy generation and distribution systems. Brokers are the focus of the design task posed to developers by the system. The brokers work as self-interested entities that try to maximize profits by trading electricity across multiple markets. To be successful, a broker has to forecast the electricity demand for customers as accurately as possible so it can use this information to operate efficiently. My proposed forecasting method uses …


Topological Data Analysis For Systems Of Coupled Oscillators, Alec Dunton Jan 2016

Topological Data Analysis For Systems Of Coupled Oscillators, Alec Dunton

HMC Senior Theses

Coupled oscillators, such as groups of fireflies or clusters of neurons, are found throughout nature and are frequently modeled in the applied mathematics literature. Earlier work by Kuramoto, Strogatz, and others has led to a deep understanding of the emergent behavior of systems of such oscillators using traditional dynamical systems methods. In this project we outline the application of techniques from topological data analysis to understanding the dynamics of systems of coupled oscillators. This includes the examination of partitions, partial synchronization, and attractors. By looking for clustering in a data space consisting of the phase change of oscillators over a …


Registration And Clustering Of Functional Observations, Zizhen Wu Jan 2016

Registration And Clustering Of Functional Observations, Zizhen Wu

Theses and Dissertations

As an important exploratory analysis, curves of similar shape are often classified into groups, which we call clustering of functional data. Phase variations or time distortions are often encountered in the biological processes, such as growth patterns or gene profiles. As a result of time distortion, curves of similar shape may not be aligned. Regular clustering methods for functional data usually ignore the presence of phase variations, which may result in low clustering accuracy. However, it is difficult to account for phase variation without knowing the cluster structure.

In this dissertation, we first propose a Bayesian method that simultaneously clusters …