Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Computer Sciences

Clustering

Institution
Publication Year
Publication

Articles 1 - 30 of 104

Full-Text Articles in Physical Sciences and Mathematics

Enabling Iov Communication Through Secure Decentralized Clustering Using Federated Deep Reinforcement Learning, Chandler Scott Aug 2024

Enabling Iov Communication Through Secure Decentralized Clustering Using Federated Deep Reinforcement Learning, Chandler Scott

Electronic Theses and Dissertations

The Internet of Vehicles (IoV) holds immense potential for revolutionizing transporta- tion systems by facilitating seamless vehicle-to-vehicle and vehicle-to-infrastructure communication. However, challenges such as congestion, pollution, and security per- sist, particularly in rural areas with limited infrastructure. Existing centralized solu- tions are impractical in such environments due to latency and privacy concerns. To address these challenges, we propose a decentralized clustering algorithm enhanced with Federated Deep Reinforcement Learning (FDRL). Our approach enables low- latency communication, competitive packet delivery ratios, and cluster stability while preserving data privacy. Additionally, we introduce a trust-based security framework for IoV environments, integrating a central authority …


Framework For Bug Inducing Commit Prediction Using Quality Metrics, Alireza Tavakkoli Barzoki Jun 2024

Framework For Bug Inducing Commit Prediction Using Quality Metrics, Alireza Tavakkoli Barzoki

Electronic Thesis and Dissertation Repository

This thesis relates to the topic of software defect prediction within the broader area of continuous software engineering. The approach presented in this thesis is employing source code and process metrics obtained for each commit, and is examining as to whether specific patterns, as the system moves from one commit to another, can predict an impending bug inducing commit. The thesis utilizes the SonarQube Technical Debt open source data which provides source code metrics and process metrics for each commit in 22 medium to large scale open source Apache projects.

Central to this research is the novel utilization of commits …


Pain Points: Cluster Analysis In Chronic Pain Networks, Iris W. Ho Jun 2024

Pain Points: Cluster Analysis In Chronic Pain Networks, Iris W. Ho

Master's Theses

Chronic pain is a pervasive health issue, affecting a significant portion of the population and posing complex challenges due to its diverse etiology and individualized impact. To address this complexity, there is a growing interest in grouping chronic pain patients based on their unique treatment needs. While various methodologies for patient grouping have emerged, leveraging graph-based approaches to produce and evaluate such groupings remains largely unexplored. Recent studies have shown promise in integrating knowledge graphs into exploring patient similarity across different biological domains, indicating potential avenues for research. Additionally, there is a growing interest in investigating patient similarity networks, highlighting …


Performance Interference Detection For Cloud-Native Applications Using Unsupervised Machine Learning Models, Eli Bakshi Jun 2024

Performance Interference Detection For Cloud-Native Applications Using Unsupervised Machine Learning Models, Eli Bakshi

Master's Theses

Contemporary cloud-native applications frequently adopt the microservice architecture, where applications are deployed within multiple containers that run on cloud virtual machines (VMs). These applications are typically hosted on public cloud platforms, where VMs from multiple cloud subscribers compete for the same physical resources on a cloud server. When a cloud subscriber application running on a VM competes for shared physical resources from other applications running on the same VM or from other VMs co-located on the same cloud server, performance interference may occur when the performance of an application degrades due to shared resource contention. Detecting such interference is crucial …


An Unsupervised Machine Learning Algorithm For Clustering Low Dimensional Data Points In Euclidean Grid Space, Josef Lazar Jan 2024

An Unsupervised Machine Learning Algorithm For Clustering Low Dimensional Data Points In Euclidean Grid Space, Josef Lazar

Senior Projects Spring 2024

Clustering algorithms provide a useful method for classifying data. The majority of well known clustering algorithms are designed to find globular clusters, however this is not always desirable. In this senior project I present a new clustering algorithm, GBCN (Grid Box Clustering with Noise), which applies a box grid to points in Euclidean space to identify areas of high point density. Points within the grid space that are in adjacent boxes are classified into the same cluster. Conversely, if a path from one point to another can only be completed by traversing an empty grid box, then they are classified …


Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin Dec 2023

Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin

Dissertations

Clustering analysis has been conducted extensively in single-cell RNA sequencing (scRNA-seq) studies. scRNA-seq can profile tens of thousands of genes' activities within a single cell. Thousands or tens of thousands of cells can be captured simultaneously in a typical scRNA-seq experiment. Biologists would like to cluster these cells for exploring and elucidating cell types or subtypes. Numerous methods have been designed for clustering scRNA-seq data. Yet, single-cell technologies develop so fast in the past few years that those existing methods do not catch up with these rapid changes and fail to fully fulfil their potential. For instance, besides profiling transcription …


Comparative Study Of Clustering Techniques On Eye-Tracking In Dynamic 3d Virtual Environments, Scott Johnson Aug 2023

Comparative Study Of Clustering Techniques On Eye-Tracking In Dynamic 3d Virtual Environments, Scott Johnson

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Eye-tracking has been used for decades to understand how and why an individual focuses on particular objects, areas, and elements of space. A vast body of knowledge exists on how eye-tracking is measured. However, historically, eye-tracking has been predominately studied using 2D environments, with limited work in 3D environments. The purpose of this study is to identify which methods most accurately represent the areas that have captured the participant’s visual attention within a 3D dynamic environment. This will be completed by evaluating different clustering methods of fixations using a customized virtual reality tool that collects eye-tracking data. There exist several …


Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane May 2023

Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane

Dissertations

High-throughput technologies such as DNA microarrays and RNA-seq are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed into Gene Co-expression Networks (GCNs). GCNs are analyzed to discover gene modules. GCN construction and analysis is a well-studied topic, for nearly two decades. While new types of sequencing and the corresponding data are now available, the software package WGCNA and its most recent variants are still widely used, contributing to biological discovery.

The discovery of biologically significant modules of genes from raw expression data is …


Clusters, Curves, And Centroids: Stellar Flare Morphology In The Ultraviolet, Vera Berger Jan 2023

Clusters, Curves, And Centroids: Stellar Flare Morphology In The Ultraviolet, Vera Berger

Pomona Senior Theses

With a novel sample of 495 high-cadence light curves for stellar flares in the near-ultraviolet, I explore similarity measures, clustering algorithms, averaging methods, and curve fitting techniques for time series. This work seeks to provide insight into whether stellar flares are similar across stars, if we can identify physically meaningful patterns in their light curves, and how to construct a comprehensive model for flares. I construct the first empirical template for flare light curves in the ultraviolet, and compute ``average elements" of flares displaying complex features such as quasi-periodic oscillations and multipeak structures. Developing accurate models for flares in the …


Unsupervised Contrastive Representation Learning For Knowledge Distillation And Clustering, Fei Ding Aug 2022

Unsupervised Contrastive Representation Learning For Knowledge Distillation And Clustering, Fei Ding

All Dissertations

Unsupervised contrastive learning has emerged as an important training strategy to learn representation by pulling positive samples closer and pushing negative samples apart in low-dimensional latent space. Usually, positive samples are the augmented versions of the same input and negative samples are from different inputs. Once the low-dimensional representations are learned, further analysis, such as clustering, and classification can be performed using the representations. Currently, there are two challenges in this framework. First, the empirical studies reveal that even though contrastive learning methods show great progress in representation learning on large model training, they do not work well for small …


Topological Hierarchies And Decomposition: From Clustering To Persistence, Kyle A. Brown Jan 2022

Topological Hierarchies And Decomposition: From Clustering To Persistence, Kyle A. Brown

Browse all Theses and Dissertations

Hierarchical clustering is a class of algorithms commonly used in exploratory data analysis (EDA) and supervised learning. However, they suffer from some drawbacks, including the difficulty of interpreting the resulting dendrogram, arbitrariness in the choice of cut to obtain a flat clustering, and the lack of an obvious way of comparing individual clusters. In this dissertation, we develop the notion of a topological hierarchy on recursively-defined subsets of a metric space. We look to the field of topological data analysis (TDA) for the mathematical background to associate topological structures such as simplicial complexes and maps of covers to clusters in …


Outlier Detection In Energy Datasets, Stephen Crawford Jan 2022

Outlier Detection In Energy Datasets, Stephen Crawford

Honors Projects

In the past decade, numerous datasets have been released with the explicit goal of furthering non-intrusive load monitoring research (NILM). NILM is an energy measurement strategy that seeks to disaggregate building-scale loads. Disaggregation attempts to turn the energy consumption of a building into its constituent appliances. NILM algorithms require representative real-world measurements which has led institutions to publish and share their own datasets. NILM algorithms are designed, trained, and tested using the data presented in a small number of these NILM datasets. Many of the datasets contain arbitrarily selected devices. Likewise, the datasets themselves report aggregate load information from building(s) …


Constructing Frameworks For Task-Optimized Visualizations, Ghulam Jilani Abdul Rahim Quadri Oct 2021

Constructing Frameworks For Task-Optimized Visualizations, Ghulam Jilani Abdul Rahim Quadri

USF Tampa Graduate Theses and Dissertations

Visualization is crucial in today’s data-driven world to augment and enhance human understanding and decision-making. Effective visualizations must support accuracy in visual task performance and expressive data communication. Effective visualization design depends on the visual channels used, chart types, or visual tasks. However, design choices and visual judgment are co-related, and effectiveness is not one-dimensional, leading to a significant need to understand the intersection of these factors to create optimized visualizations. Hence, constructing frameworks that consider both design decisions and the task being performed enables optimizing visualization design to maximize efficacy. This dissertation describes experiments, techniques, and user studies to …


Cluster Hire In Social Networks Using Modified Weighted Structural Clustering Algorithm For Networks (Mwscan), Harshil Patal Oct 2021

Cluster Hire In Social Networks Using Modified Weighted Structural Clustering Algorithm For Networks (Mwscan), Harshil Patal

Electronic Theses and Dissertations

The concept of effective collaboration within a group is immensely used in organizations as a viable means for improving team performance. Any organization or prominent institute, who works with multiple projects needs to hire a group of experts who can complete a set of projects. When hiring a group of experts, numerous considerations must be taken into account. In the Cluster Hire problem, we are given a set of experts, each having a set of skills. Also, we are given a set of projects, each requiring a set of skills. Upon completion of each project, a profit is generated for …


Piecewise Linear Manifold Clustering, Artyom Diky Sep 2021

Piecewise Linear Manifold Clustering, Artyom Diky

Dissertations, Theses, and Capstone Projects

This work studies the application of topological analysis to non-linear manifold clustering. A novel method, that exploits the data clustering structure, allows to generate a topological representation of the point dataset. An analysis of topological construction under different simulated conditions is performed to explore the capabilities and limitations of the method, and demonstrated statistically significant improvements in performance. Furthermore, we introduce a new information-theoretical validation measure for clustering, that exploits geometrical properties of clusters to estimate clustering compressibility, for evaluation of the clustering goodness-of-fit without any prior information about true class assignments. We show how the new validation measure, when …


Automated Parsing Of Flexible Molecular Systems Using Principal Component Analysis And K-Means Clustering Techniques, Matthew J. Nwerem Aug 2021

Automated Parsing Of Flexible Molecular Systems Using Principal Component Analysis And K-Means Clustering Techniques, Matthew J. Nwerem

Computational and Data Sciences (MS) Theses

Computational investigation of molecular structures and reactions of biological and pharmaceutical interests remains a grand scientific challenge due to the size and conformational flexibility of these systems. The work requires parsing and analyzing thousands of conformations in each molecular state for meaningful chemical information and subjecting the ensemble to costly quantum chemical calculations. The current status quo typically involves a manual process where the investigator must look at each conformation, separating each into structural families. This process is time-intensive and tedious, making this process infeasible in some cases, and limiting the ability of theoreticians to study these systems. However, the …


Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern Aug 2021

Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern

Theses and Dissertations

Classification and clustering of music genres has become an increasingly prevalent focusin recent years, prompting a push for research into relevant algorithms. The most successful algorithms have typically applied the Naive Bayes or k-Nearest Neighbors algorithms, or used Neural Networks to perform classification. This thesis seeks to investigate the use of unsupervised clustering algorithms such as K-Means or Hierarchical clustering, and establish their usefulness in comparison to or conjunction with established methods.


Exploring The Long Tail, Joseph H. Hajjar Jun 2021

Exploring The Long Tail, Joseph H. Hajjar

Dartmouth College Undergraduate Theses

The migration of datasets online has created a near-infinite inventory for big name retailers such as Amazon and Netflix, giving rise to recommendation systems to assist users in navigating the massive catalog. This has also allowed for the possibility of retailers storing much less popular, uncommon items which would not appear in a more traditional brick-and-mortar setting due to the cost of storage. Nevertheless, previous work has highlighted the profit potential which lies in the so-called "long tail'' of niche, unpopular items. Unfortunately, due to the limited amount of data in this subset of the inventory, recommendation systems often struggle …


Correction Of Back Trajectories Utilizing Machine Learning, Britta F. Gjermo Morrison Mar 2021

Correction Of Back Trajectories Utilizing Machine Learning, Britta F. Gjermo Morrison

Theses and Dissertations

The goal of this work was to analyze 24-hour back trajectory performance from a global, low-resolution weather model compared to a high-resolution limited area weather model in particular meteorological regimes, or flow patterns using K-means clustering, an unsupervised machine learning technique. The duration of this study was from 2015-2019 for the contiguous United States (CONUS). Three different machine learning algorithms were tested to study the utility of these methods improving the performance of the CFS relative to the performance of the RAP. The aforementioned machine learning techniques are linear regression, Bayesian ridge regression, and random forest regression. These results mean …


Clustering Data To Classify Hearthstone Decks, Tim Inzitari Jan 2021

Clustering Data To Classify Hearthstone Decks, Tim Inzitari

Williams Honors College, Honors Research Projects

The esports game of "Hearthstone" is a collectible card game with a competitive format that has every team submit 4 decks of 30 cards each. Using K-Means clustering an adaptable way to group data for classifying can be made that works well in every update of the game. This system will take in a list of decks and cluster them to easily classify large amounts of information in a timely fashion. This system will be able to be used by the Universities esports department for years to come to aid the preparation of "Hearthstone" matches. This model uses qualities about …


Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger Jan 2021

Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger

Browse all Theses and Dissertations

The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network …


Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger Jan 2021

Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger

Browse all Theses and Dissertations

The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network …


Identification And Classification Of Radio Pulsar Signals Using Machine Learning, Di Pang Jan 2021

Identification And Classification Of Radio Pulsar Signals Using Machine Learning, Di Pang

Graduate Theses, Dissertations, and Problem Reports

Automated single-pulse search approaches are necessary as ever-increasing amount of observed data makes the manual inspection impractical. Detecting radio pulsars using single-pulse searches, however, is a challenging problem for machine learning because pul- sar signals often vary significantly in brightness, width, and shape and are only detected in a small fraction of observed data.

The research work presented in this dissertation is focused on development of ma- chine learning algorithms and approaches for single-pulse searches in the time domain. Specifically, (1) We developed a two-stage single-pulse search approach, named Single- Pulse Event Group IDentification (SPEGID), which automatically identifies and clas- …


Cluster Analysis Of Time Series Data With Application To Hydrological Events And Serious Illness Conversations, Ali Javed Jan 2021

Cluster Analysis Of Time Series Data With Application To Hydrological Events And Serious Illness Conversations, Ali Javed

Graduate College Dissertations and Theses

Cluster analysis explores the underlying structure of data and organizes it into groups (i.e., clusters) such that observations within the same group are more similar than those in different groups. Quantifying the ``similarity'' between observations, choosing the optimal number of clusters, and interpreting the results all require careful consideration of the research question at hand, the model parameters, the amount of data and their attributes. In this dissertation, the first manuscript explores the impact of design choices and the variability in clustering performance on different datasets. This is demonstrated through a benchmark study consisting of 128 datasets from the University …


Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott Dec 2020

Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott

LSU Doctoral Dissertations

Modern human-machine systems such as microservices rely upon agile engineering practices which require changes to be tested and released more frequently than classically engineered systems. A critical step in the testing of such systems is the generation of realistic workloads or load testing. Generated workload emulates the expected behaviors of users and machines within a system under test in order to find potentially unknown failure states. Typical testing tools rely on static testing artifacts to generate realistic workload conditions. Such artifacts can be cumbersome and costly to maintain; however, even model-based alternatives can prevent adaptation to changes in a system …


Global Optimization Algorithms For Image Registration And Clustering, Cuicui Zheng Aug 2020

Global Optimization Algorithms For Image Registration And Clustering, Cuicui Zheng

Dissertations

Global optimization is a classical problem of finding the minimum or maximum value of an objective function. It has applications in many areas, such as biological image analysis, chemistry, mechanical engineering, financial analysis, deep learning and image processing. For practical applications, it is important to understand the efficiency of global optimization algorithms. This dissertation develops and analyzes some new global optimization algorithms and applies them to practical problems, mainly for image registration and data clustering.

First, the dissertation presents a new global optimization algorithm which approximates the optimum using only function values. The basic idea is to use the points …


Learning Health Information From Floor Sensor Data Within A Pervasive Smart Home Environment, Nicholas Brent Burns Aug 2020

Learning Health Information From Floor Sensor Data Within A Pervasive Smart Home Environment, Nicholas Brent Burns

Computer Science and Engineering Dissertations

Spatial and temporal gait analysis can provide useful measures for determining a person’s state of health while also identifying deviations in day-to-day activity. The SmartCare project is a multi-discipline health technologies project that aims to provide an unobtrusive and pervasive system that provides in-home health monitoring for the elderly. This research work focuses on the pressure-sensitive smart floor of the SmartCare project by using an experimental floor to develop methods for future use on a floor deployed within a home. This work presents a procedure to automatically calibrate a smart floor’s pressure sensors without specialized physical effort. The calibration algorithm …


Reinforcement Learning In Large, Structured Action Spaces: A Simulation Study Of Decision Support For Spinal Cord Injury Rehabilitation, Nathan Phelps Jul 2020

Reinforcement Learning In Large, Structured Action Spaces: A Simulation Study Of Decision Support For Spinal Cord Injury Rehabilitation, Nathan Phelps

Electronic Thesis and Dissertation Repository

Reinforcement learning (RL) has helped improve decision-making in several applications. However, applying traditional RL is challenging in some applications, such as rehabilitation of people with a spinal cord injury (SCI). Among other factors, using RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few patients (i.e., limited training data). Treatments for SCIs have natural groupings, so we propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding …


Developing Agent-Based Models To Study Financial Markets, Saurav Chakraborty Apr 2020

Developing Agent-Based Models To Study Financial Markets, Saurav Chakraborty

USF Tampa Graduate Theses and Dissertations

This dissertation presents research that employs agent-based modelling to provide a framework to support simulation as a complement to traditional economic models for policy evaluation. It consists of three studies. The first study employs cluster analysis to capture the different types of banks and the associated business models that define their decision-making. The results from study one will help us get an understanding of how different banks behave and provide an insight into their lending practices. Hence, it would be very helpful in evaluating and analyzing the impact of future policies. Study two develops a fine-grained interbank lending model based …


Machine Learning And Data Mining-Based Methods To Estimate Parity Status And Age Of Wild Mosquito Vectors Of Infectious Diseases From Near-Infrared Spectra, Masabho Peter Milali Apr 2020

Machine Learning And Data Mining-Based Methods To Estimate Parity Status And Age Of Wild Mosquito Vectors Of Infectious Diseases From Near-Infrared Spectra, Masabho Peter Milali

Dissertations (1934 -)

Previous studies show that a trained partial least square regresser [sic] (PLSR) from near-infrared spectra classify laboratory and semi-field raised mosquitoes into less than or ≥ to seven days old with an average accuracy of 80%. This dissertation demonstrates that training models on near-infrared spectra (NIRS) using artificial neural network (ANN) as an architecture yields models with higher accuracies than training models using partial least squares (PLS) as an architecture. In addition, irrespective of the model architecture used, direct training of a binary classifier scores higher accuracy than training a regresser and interpreting it as a binary classifier. Furthermore, for …