Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2016

Clustering

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 25 of 25

Full-Text Articles in Physical Sciences and Mathematics

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis Dec 2016

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis

Open Access Dissertations

Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to newly …


Low Rank Methods For Optimizing Clustering, Yangyang Hou Dec 2016

Low Rank Methods For Optimizing Clustering, Yangyang Hou

Open Access Dissertations

Complex optimization models and problems in machine learning often have the majority of information in a low rank subspace. By careful exploitation of these low rank structures in clustering problems, we find new optimization approaches that reduce the memory and computational cost.

We discuss two cases where this arises. First, we consider the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address overlapping and outliers in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. We utilize low …


Differentially Private Data Publishing For Data Analysis, Dong Su Dec 2016

Differentially Private Data Publishing For Data Analysis, Dong Su

Open Access Dissertations

In the information age, vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to design mechanisms for effectively extracting knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of differentially private data publishing. Firstly, we propose PrivPfC, a differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done using the exponential …


Applying Ahp And Clustering Approaches For Public Transportation Decisionmaking: A Case Study Of Isfahan City, Alireza Salavati, Hossein Haghshenas, Bahador Ghadirifaraz, Jamshid Laghaei, Ghodrat Eftekhari Dec 2016

Applying Ahp And Clustering Approaches For Public Transportation Decisionmaking: A Case Study Of Isfahan City, Alireza Salavati, Hossein Haghshenas, Bahador Ghadirifaraz, Jamshid Laghaei, Ghodrat Eftekhari

Journal of Public Transportation

The main purpose of this paper is to define appropriate criteria for the systematic approach to evaluate and prioritize multiple candidate corridors for public transport investment simultaneously to serve travel demand, regarding supply of current public transportation system and road network conditions of Isfahan, Iran. To optimize resource allocation, policymakers need to identify proper corridors to implement a public transportation system. In fact, the main question is to adopt the best public transportation system for each main corridor of Isfahan. In this regard, 137 questionnaires were completed by experts, directors, and policymakers of Isfahan to identify goals and objectives in …


Semi-Automated Tool For Providing Effective Feedback On Programming Assignments, Min Yan Beh, Swapna Gottipati, David Lo, Venky Shankararaman Dec 2016

Semi-Automated Tool For Providing Effective Feedback On Programming Assignments, Min Yan Beh, Swapna Gottipati, David Lo, Venky Shankararaman

Research Collection School Of Computing and Information Systems

Human grading of introductory programming assignments is tedious and error-prone, hence researchers have attempted to develop tools that support automatic assessment of programming code. However, most such efforts often focus only on scoring solutions, rather than assessing whether students correctly understand the problems. To aid the students improve programming skills, effective feedback on programming assignments plays an important role. Individual feedback generation is tedious and painstaking process. We present a tool that not only automatically generates the static and dynamic program analysis outcomes, but also clusters similar code submissions to provide scalable and effective feedback to the students. We studied …


Analyzing And Organizing The Sonic Space Of Vocal Imitation, Davide Andrea Mauro Phd, D. Rocchesso Nov 2016

Analyzing And Organizing The Sonic Space Of Vocal Imitation, Davide Andrea Mauro Phd, D. Rocchesso

Davide Andrea Mauro

The sonic space that can be spanned with the voice is vast and complex and, therefore, it is difficult to organize and explore. In order to devise tools that facilitate sound design by vocal sketching we attempt at organizing a database of short excerpts of vocal imitations. By clustering the sound samples on a space whose dimensionality has been reduced to the two principal components, it is experimentally checked how meaningful the resulting clusters are for humans. Eventually, a representative of each cluster, chosen to be close to its centroid, may serve as a landmark in the exploration of the …


Analyzing And Organizing The Sonic Space Of Vocal Imitation, Davide Andrea Mauro Phd, D. Rocchesso Nov 2016

Analyzing And Organizing The Sonic Space Of Vocal Imitation, Davide Andrea Mauro Phd, D. Rocchesso

Davide Andrea Mauro

The sonic space that can be spanned with the voice is vast and complex and, therefore, it is difficult to organize and explore. In order to devise tools that facilitate sound design by vocal sketching we attempt at organizing a database of short excerpts of vocal imitations. By clustering the sound samples on a space whose dimensionality has been reduced to the two principal components, it is experimentally checked how meaningful the resulting clusters are for humans. Eventually, a representative of each cluster, chosen to be close to its centroid, may serve as a landmark in the exploration of the …


Shape Analysis Of Traffic Flow Curves Using A Hybrid Computational Analysis, Wasim Irshad Kayani, Shikhar P. Acharya, Ivan G. Guardiola, Donald C. Wunsch, B. Schumacher, Isaac Wagner-Muns Nov 2016

Shape Analysis Of Traffic Flow Curves Using A Hybrid Computational Analysis, Wasim Irshad Kayani, Shikhar P. Acharya, Ivan G. Guardiola, Donald C. Wunsch, B. Schumacher, Isaac Wagner-Muns

Engineering Management and Systems Engineering Faculty Research & Creative Works

This paper highlights and validates the use of shape analysis using Mathematical Morphology tools as a means to develop meaningful clustering of historical data. Furthermore, through clustering more appropriate grouping can be accomplished that can result in the better parameterization or estimation of models. This results in more effective prediction model development. Hence, in an effort to highlight this within the research herein, a Back-Propagation Neural Network is used to validate the classification achieved through the employment of MM tools. Specifically, the Granulometric Size Distribution (GSD) is used to achieve clustering of daily traffic flow patterns based solely on their …


A Computational Framework For Learning From Complex Data: Formulations, Algorithms, And Applications, Wenlu Zhang Jul 2016

A Computational Framework For Learning From Complex Data: Formulations, Algorithms, And Applications, Wenlu Zhang

Computer Science Theses & Dissertations

Many real-world processes are dynamically changing over time. As a consequence, the observed complex data generated by these processes also evolve smoothly. For example, in computational biology, the expression data matrices are evolving, since gene expression controls are deployed sequentially during development in many biological processes. Investigations into the spatial and temporal gene expression dynamics are essential for understanding the regulatory biology governing development. In this dissertation, I mainly focus on two types of complex data: genome-wide spatial gene expression patterns in the model organism fruit fly and Allen Brain Atlas mouse brain data. I provide a framework to explore …


Optimizing Main Memory Usage In Modern Computing Systems To Improve Overall System Performance, Daniel Jose Campello Jun 2016

Optimizing Main Memory Usage In Modern Computing Systems To Improve Overall System Performance, Daniel Jose Campello

FIU Electronic Theses and Dissertations

Operating Systems use fast, CPU-addressable main memory to maintain an application’s temporary data as anonymous data and to cache copies of persistent data stored in slower block-based storage devices. However, the use of this faster memory comes at a high cost. Therefore, several techniques have been implemented to use main memory more efficiently in the literature. In this dissertation we introduce three distinct approaches to improve overall system performance by optimizing main memory usage.

First, DRAM and host-side caching of file system data are used for speeding up virtual machine performance in today’s virtualized data centers. The clustering of VM …


Statistical Modeling Of Carbon Dioxide And Cluster Analysis Of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, And Multi-Level Time Series Clustering, Doo Young Kim Jun 2016

Statistical Modeling Of Carbon Dioxide And Cluster Analysis Of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, And Multi-Level Time Series Clustering, Doo Young Kim

USF Tampa Graduate Theses and Dissertations

The current study consists of three major parts. Statistical modeling, the connection between statistical modeling and cluster analysis, and proposing new methods to cluster time dependent information.

First, we perform a statistical modeling of the Carbon Dioxide (CO2) emission in South Korea in order to identify the attributable variables including interaction effects. One of the hot issues in the earth in 21st century is Global warming which is caused by the marriage between atmospheric temperature and CO2 in the atmosphere. When we confront this global problem, we first need to verify what causes the problem then we …


Efficient Algorithms For Clustering Polygonal Obstacles, Sabbir Kumar Manandhar May 2016

Efficient Algorithms For Clustering Polygonal Obstacles, Sabbir Kumar Manandhar

UNLV Theses, Dissertations, Professional Papers, and Capstones

Clustering a set of points in Euclidean space is a well-known problem having applications in pattern recognition, document image analysis, big-data analytics, and robotics. While there are a lot of research publications for clustering point objects, only a few articles have been reported for clustering a given distribution of obstacles. In this thesis we examine the development of efficient algorithms for clustering a given set of convex obstacles in the 2D plane. One of the methods presented in this work uses a Voronoi diagram to extract obstacle clusters. We also consider the implementation issues of point/obstacle clustering algorithms.


Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer May 2016

Confirm: Clustering Of Noisy Form Images Using Robust Matching, Christopher Alan Tensmeyer

Theses and Dissertations

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform …


Variance Of Clusterings On Graphs, Thomas Vlado Mulc Apr 2016

Variance Of Clusterings On Graphs, Thomas Vlado Mulc

Mathematical Sciences Technical Reports (MSTR)

Graphs that represent data often have structures or characteristics that can represent some relationships in the data. One of these structures is clusters or community structures. Most clustering algorithms for graphs are deterministic, which means they will output the same clustering each time. We investigated a few stochastic algorithms, and look into the consistency of their clusterings.


Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya Apr 2016

Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya

Open Access Theses

As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic …


Macroconstants Of Development: A New Benchmark For The Strategic Development Of Advanced Countries And Firms, Andrey Bystrov, Vyacheslav Yusim, Tamilla Curtis Mar 2016

Macroconstants Of Development: A New Benchmark For The Strategic Development Of Advanced Countries And Firms, Andrey Bystrov, Vyacheslav Yusim, Tamilla Curtis

Dr. Tamilla Curtis

This research proposed a new indicator of countries’ development called “macroconstants of development”. The literature review indicates that the concept of "macroconstants of development" is not used at the moment in neither the theory nor the practice of industrial policy. Research of longitudinal data of total GDP, GDP per capita and their derivatives for most countries of the world was conducted. An analysis of statistical information has been done by employing econometric analyses.

Based on the analysis of the statistical data, which characterizes the development of large, technologically advanced countries in ordinary conditions, it was identified that the average acceleration …


Increment - Interactive Cluster Refinement, Logan Adam Mitchell Mar 2016

Increment - Interactive Cluster Refinement, Logan Adam Mitchell

Theses and Dissertations

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the …


Semantics And Result Disambiguation For Keyword Search On Tree Data, Cem Aksoy Jan 2016

Semantics And Result Disambiguation For Keyword Search On Tree Data, Cem Aksoy

Dissertations

Keyword search is a popular technique for searching tree-structured data (e.g., XML, JSON) on the web because it frees the user from learning a complex query language and the structure of the data sources. However, the convenience of keyword search comes with drawbacks. The imprecision of the keyword queries usually results in a very large number of results of which only very few are relevant to the query. Multiple previous approaches have tried to address this problem. Some of them exploit structural and semantic properties of the tree data in order to filter out irrelevant results while others use a …


Spatial Analysis Of Forest Crimes In Mark Twain National Forest, Missouri, Karun Pandit, Eddie Bevilacqua, Giorgos Mountrakis, Robert W. Malmsheimer Jan 2016

Spatial Analysis Of Forest Crimes In Mark Twain National Forest, Missouri, Karun Pandit, Eddie Bevilacqua, Giorgos Mountrakis, Robert W. Malmsheimer

Journal of Geospatial Applications in Natural Resources

Forest crime mitigation has been identified as a challenging issue in forest management in the United States. Knowledge of the spatial pattern of forest crimes would help in wisely allocating limited enforcement resources to curb forest crimes. This study explores the spatial pattern of three different types of forest crimes: fire crime, illegal timber logging crime, and occupancy use crime in the Salem-Patosi Ranger District of Mark Twain National Forest. Univariate and bivariate Ripley’s K-functions were applied to explore the spatial patterns in crime events, like clustering and attraction among forest crime types. Results reveal significant clustering for each forest …


Forecasting Customer Electricity Load Demand In The Power Trading Agent Competition Using Machine Learning, Saiful Abu Jan 2016

Forecasting Customer Electricity Load Demand In The Power Trading Agent Competition Using Machine Learning, Saiful Abu

Open Access Theses & Dissertations

Accurate electricity load demand forecasting is an important problem in managing the power grid for both economic and environmental reasons. The Power TAC simulation provides a platform to do research on smart grid energy generation and distribution systems. Brokers are the focus of the design task posed to developers by the system. The brokers work as self-interested entities that try to maximize profits by trading electricity across multiple markets. To be successful, a broker has to forecast the electricity demand for customers as accurately as possible so it can use this information to operate efficiently. My proposed forecasting method uses …


Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke Jan 2016

Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke

Department of Statistics: Faculty Publications

Background: Clustering is a widely used collection of unsupervised learning techniques for identifying natural classes within a data set. It is often used in bioinformatics to infer population substructure. Genomic data are often categorical and high dimensional, e.g., long sequences of nucleotides. This makes inference challenging: The distance metric is often not well-defined on categorical data; running time for computations using high dimensional data can be considerable; and the Curse of Dimensionality often impedes the interpretation of the results. Up to the present, however, the literature and software addressing clustering for categorical data has not yet led to a standard …


Topological Data Analysis For Systems Of Coupled Oscillators, Alec Dunton Jan 2016

Topological Data Analysis For Systems Of Coupled Oscillators, Alec Dunton

HMC Senior Theses

Coupled oscillators, such as groups of fireflies or clusters of neurons, are found throughout nature and are frequently modeled in the applied mathematics literature. Earlier work by Kuramoto, Strogatz, and others has led to a deep understanding of the emergent behavior of systems of such oscillators using traditional dynamical systems methods. In this project we outline the application of techniques from topological data analysis to understanding the dynamics of systems of coupled oscillators. This includes the examination of partitions, partial synchronization, and attractors. By looking for clustering in a data space consisting of the phase change of oscillators over a …


A Hybrid Approach To Semantic Hashtag Clustering In Social Media, Ali Javed Jan 2016

A Hybrid Approach To Semantic Hashtag Clustering In Social Media, Ali Javed

Graduate College Dissertations and Theses

The uncontrolled usage of hashtags in social media makes them vary a lot in the quality of semantics and the frequency of usage. Such variations pose a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag by using metadata or the contextual semantics of a hashtag by using the texts associated with a hashtag. This thesis presents a hybrid approach to clustering hashtags based on their semantics, designed in two phases. The first phase is a sense-level metadata-based semantic clustering algorithm that has the ability to differentiate among distinct senses of a hashtag as …


Registration And Clustering Of Functional Observations, Zizhen Wu Jan 2016

Registration And Clustering Of Functional Observations, Zizhen Wu

Theses and Dissertations

As an important exploratory analysis, curves of similar shape are often classified into groups, which we call clustering of functional data. Phase variations or time distortions are often encountered in the biological processes, such as growth patterns or gene profiles. As a result of time distortion, curves of similar shape may not be aligned. Regular clustering methods for functional data usually ignore the presence of phase variations, which may result in low clustering accuracy. However, it is difficult to account for phase variation without knowing the cluster structure.

In this dissertation, we first propose a Bayesian method that simultaneously clusters …


Macroconstants Of Development: A New Benchmark For The Strategic Development Of Advanced Countries And Firms, Andrey V. Bystrov, Vyacheslav N. Yusim, Tamilla Curtis Jan 2016

Macroconstants Of Development: A New Benchmark For The Strategic Development Of Advanced Countries And Firms, Andrey V. Bystrov, Vyacheslav N. Yusim, Tamilla Curtis

Publications

This research proposed a new indicator of countries’ development called “macroconstants of development”. The literature review indicates that the concept of "macroconstants of development" is not used at the moment in neither the theory nor the practice of industrial policy. Research of longitudinal data of total GDP, GDP per capita and their derivatives for most countries of the world was conducted. An analysis of statistical information has been done by employing econometric analyses.

Based on the analysis of the statistical data, which characterizes the development of large, technologically advanced countries in ordinary conditions, it was identified that the average acceleration …