Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Physical Sciences and Mathematics

Parameter Estimation For Normally Distributed Grouped Data And Clustering Single-Cell Rna Sequencing Data Via The Expectation-Maximization Algorithm, Zahra Aghahosseinalishirazi Sep 2023

Parameter Estimation For Normally Distributed Grouped Data And Clustering Single-Cell Rna Sequencing Data Via The Expectation-Maximization Algorithm, Zahra Aghahosseinalishirazi

Electronic Thesis and Dissertation Repository

The Expectation-Maximization (EM) algorithm is an iterative algorithm for finding the maximum likelihood estimates in problems involving missing data or latent variables. The EM algorithm can be applied to problems consisting of evidently incomplete data or missingness situations, such as truncated distributions, censored or grouped observations, and also to problems in which the missingness of the data is not natural or evident, such as mixed-effects models, mixture models, log-linear models, and latent variables. In Chapter 2 of this thesis, we apply the EM algorithm to grouped data, a problem in which incomplete data are evident. Nowadays, data confidentiality is of …


Robust Mahalanobis K-Means Algorithm In Comparison With Other Existing Clustering Methods., Eleazer Tabi Serebour Aug 2023

Robust Mahalanobis K-Means Algorithm In Comparison With Other Existing Clustering Methods., Eleazer Tabi Serebour

Open Access Theses & Dissertations

This study enhances K-means Mahalanobis clustering using Density Power Divergence (DPD) for outlier handling and detection. Through the utilization of simulations and the analysis of real-world data, our approach consistently outperforms standard K-means, Mahalanobis K-means, Fuzzy C-means, and others in clustering datasets with outliers. While our method performs similarly to others on spherical datasets, it ranks second to DBSCAN for arbitrary shapes. We showcase its superiority on real-life datasets (Iris flower and wheat seed), demonstrating resilient outlier identification. By navigating various structures and cluster characteristics, our Modified Mahalanobis K-means method proves adaptable and robust, offering insights into diverse clustering scenarios. …


Redefining Nba Basketball Positions Through Visualization And Mega-Cluster Analysis, Alexander L. Hedquist Aug 2022

Redefining Nba Basketball Positions Through Visualization And Mega-Cluster Analysis, Alexander L. Hedquist

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Basketball players have historically been classified based on one of five positions, namely Point Guards, Shooting Guards, Small Forwards, and Centers. While grouping players into these five categories may provide general descriptions of their perceived role, these standard positions fall short of describing players based on their true abilities and performance. This MS thesis proposes a method to group players of the National Basketball Association (NBA) from the past 20 seasons into more meaningful and specific player positions. We systematically group these players into nine distinct categories, and we draw from a vast array of visualization tools, techniques, and software …


Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell Jan 2019

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell

Theses and Dissertations

The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for …


Statistical Methods For Mixed Frequency Data Sampling Models, Yun Liu Jan 2019

Statistical Methods For Mixed Frequency Data Sampling Models, Yun Liu

Dissertations, Master's Theses and Master's Reports

The MIDAS models are developed to handle different sampling frequencies in one regression model, preserving information in the higher sampling frequency. Time averaging has been the traditional parametric approach to handle mixed sampling frequencies. However, it ignores information potentially embedded in high frequency. MIDAS regression models provide a concise way to utilize additional information in HF variables. While a parametric MIDAS model provides a parsimonious way to summarize information in HF data, nonparametric models would maintain more flexibility at the expense of the computational complexity. Moreover, one parametric form may not necessarily be appropriate for all cross-sectional subjects. This thesis …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …


Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong Aug 2018

Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong

Electronic Theses and Dissertations

Sorting out data into partitions is increasing becoming complex as the constituents of data is growing outward everyday. Mixed data comprises continuous, categorical, directional functional and other types of variables. Clustering mixed data is based on special dissimilarities of the variables. Some data types may influence the clustering solution. Assigning appropriate weight to the functional data may improve the performance of the clustering algorithm. In this paper we use the extension of the Gower coefficient with judciously chosen weight for the L2 to cluster mixed data.The benefits of weighting are demonstrated both in in applications to the Buoy data set …


Offline And Online Density Estimation For Large High-Dimensional Data, Aref Majdara Jan 2018

Offline And Online Density Estimation For Large High-Dimensional Data, Aref Majdara

Dissertations, Master's Theses and Master's Reports

Density estimation has wide applications in machine learning and data analysis techniques including clustering, classification, multimodality analysis, bump hunting and anomaly detection. In high-dimensional space, sparsity of data in local neighborhood makes many of parametric and nonparametric density estimation methods mostly inefficient.

This work presents development of computationally efficient algorithms for high-dimensional density estimation, based on Bayesian sequential partitioning (BSP). Copula transform is used to separate the estimation of marginal and joint densities, with the purpose of reducing the computational complexity and estimation error. Using this separation, a parallel implementation of the density estimation algorithm on a 4-core CPU is …


Data Analysis Methods Using Persistence Diagrams, Andrew Marchese Aug 2017

Data Analysis Methods Using Persistence Diagrams, Andrew Marchese

Doctoral Dissertations

In recent years, persistent homology techniques have been used to study data and dynamical systems. Using these techniques, information about the shape and geometry of the data and systems leads to important information regarding the periodicity, bistability, and chaos of the underlying systems. In this thesis, we study all aspects of the application of persistent homology to data analysis. In particular, we introduce a new distance on the space of persistence diagrams, and show that it is useful in detecting changes in geometry and topology, which is essential for the supervised learning problem. Moreover, we introduce a clustering framework directly …


A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis Dec 2016

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis

Open Access Dissertations

Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to newly …


Statistical Modeling Of Carbon Dioxide And Cluster Analysis Of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, And Multi-Level Time Series Clustering, Doo Young Kim Jun 2016

Statistical Modeling Of Carbon Dioxide And Cluster Analysis Of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, And Multi-Level Time Series Clustering, Doo Young Kim

USF Tampa Graduate Theses and Dissertations

The current study consists of three major parts. Statistical modeling, the connection between statistical modeling and cluster analysis, and proposing new methods to cluster time dependent information.

First, we perform a statistical modeling of the Carbon Dioxide (CO2) emission in South Korea in order to identify the attributable variables including interaction effects. One of the hot issues in the earth in 21st century is Global warming which is caused by the marriage between atmospheric temperature and CO2 in the atmosphere. When we confront this global problem, we first need to verify what causes the problem then we …


Registration And Clustering Of Functional Observations, Zizhen Wu Jan 2016

Registration And Clustering Of Functional Observations, Zizhen Wu

Theses and Dissertations

As an important exploratory analysis, curves of similar shape are often classified into groups, which we call clustering of functional data. Phase variations or time distortions are often encountered in the biological processes, such as growth patterns or gene profiles. As a result of time distortion, curves of similar shape may not be aligned. Regular clustering methods for functional data usually ignore the presence of phase variations, which may result in low clustering accuracy. However, it is difficult to account for phase variation without knowing the cluster structure.

In this dissertation, we first propose a Bayesian method that simultaneously clusters …


Computational Intelligence Based Complex Adaptive System-Of-Systems Architecture Evolution Strategy, Siddharth Agarwal Jan 2015

Computational Intelligence Based Complex Adaptive System-Of-Systems Architecture Evolution Strategy, Siddharth Agarwal

Doctoral Dissertations

The dynamic planning for a system-of-systems (SoS) is a challenging endeavor. Large scale organizations and operations constantly face challenges to incorporate new systems and upgrade existing systems over a period of time under threats, constrained budget and uncertainty. It is therefore necessary for the program managers to be able to look at the future scenarios and critically assess the impact of technology and stakeholder changes. Managers and engineers are always looking for options that signify affordable acquisition selections and lessen the cycle time for early acquisition and new technology addition. This research helps in analyzing sequential decisions in an evolving …


Methods For Identifying Regions Of Brain Activation Using Fmri Meta-Data, Meredith A. Ray Dec 2014

Methods For Identifying Regions Of Brain Activation Using Fmri Meta-Data, Meredith A. Ray

Theses and Dissertations

Functional neuroimaging is a relatively young discipline within the neurosciences that has led to significant advances in our understanding of the human brain and progress in neuroscientific research related to public health. Accurately identifying activated regions in the brain showing a strong association with an outcome of interest is crucial in terms of disease prediction and prevention. Functional magnetic resonance imaging (fMRI) is the most widely used method for this type of study as it has the ability to measure and identify the location of changes in tissue perfusion, blood oxygenation, and blood volume. In practice, the three-dimensional brain locations …


Online Multi-Stage Deep Architectures For Feature Extraction And Object Recognition, Derek Christopher Rose Aug 2013

Online Multi-Stage Deep Architectures For Feature Extraction And Object Recognition, Derek Christopher Rose

Doctoral Dissertations

Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. …


Class Discovery And Prediction Of Tumor With Microarray Data, Bo Liu Jan 2011

Class Discovery And Prediction Of Tumor With Microarray Data, Bo Liu

All Graduate Theses, Dissertations, and Other Capstone Projects

Current microarray technology is able take a single tissue sample to construct an Affymetrix oglionucleotide array containing (estimated) expression levels of thousands of different genes for that tissue. The objective is to develop a more systematic approach to cancer classification based on Affymetrix oglionucleotide microarrays. For this purpose, I studied published colon cancer microarray data. Colon cancer, with 655,000 deaths worldwide per year, has become the fourth most common form of cancer in the United States and the third leading cause of cancer - related death in the Western world. This research has been focuses in two areas: class discovery, …


Clustering Methods For Delineating Regions Of Spatial Stationarity, Jared M. Collings Nov 2007

Clustering Methods For Delineating Regions Of Spatial Stationarity, Jared M. Collings

Theses and Dissertations

This paper seeks to further investigate data extracted by the use of Functional Magnetic Resonance Imaging (FMRI) as it is applied to brain tissue and how it measures blood flow to certain areas of the brain following the application of a stimulus. As a precursor to detailed spatial analysis of this kind of data, this paper develops methods of grouping data based on the necessary conditions for spatial statistical analysis. The purpose of this paper is to examine and develop methods that can be used to delineate regions of stationarity. One of the major assumptions used in spatial estimation is …


Machine Learning Approaches For Determining Effective Seeds For K -Means Algorithm, Kaveephong Lertwachara Apr 2003

Machine Learning Approaches For Determining Effective Seeds For K -Means Algorithm, Kaveephong Lertwachara

Doctoral Dissertations

In this study, I investigate and conduct an experiment on two-stage clustering procedures, hybrid models in simulated environments where conditions such as collinearity problems and cluster structures are controlled, and in real-life problems where conditions are not controlled. The first hybrid model (NK) is an integration between a neural network (NN) and the k-means algorithm (KM) where NN screens seeds and passes them to KM. The second hybrid (GK) uses a genetic algorithm (GA) instead of the neural network. Both NN and GA used in this study are in their simplest-possible forms.

In the simulated data sets, I investigate two …