Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics and Probability

Machine learning

Institution
Publication Year
Publication

Articles 1 - 30 of 87

Full-Text Articles in Physical Sciences and Mathematics

Reinforcement Learning: Applying Low Discrepancy Action Selection To Deep Deterministic Policy Gradient, Aleksandr Svishchev Jan 2024

Reinforcement Learning: Applying Low Discrepancy Action Selection To Deep Deterministic Policy Gradient, Aleksandr Svishchev

Electronic Theses and Dissertations

Reinforcement learning (RL) is a subfield of machine learning concerned with agents learning to behave optimally by interacting with an environment. One of the most important topics in RL is how the agent should explore, that is, how to choose actions in order to rate their impact on long-term reward. For example, a simple baseline strategy might be uniformly random action selection. This thesis investigates the heuristic idea that agents will learn faster if they explore by factoring the environment’s state into their decision and intentionally choose actions which are as different as possible from what they have previously observed. …


Statistical And Biological Analyses Of Acoustic Signals In Estrildid Finches, Moises Rivera Jun 2023

Statistical And Biological Analyses Of Acoustic Signals In Estrildid Finches, Moises Rivera

Dissertations, Theses, and Capstone Projects

Acoustic communication is a process that involves auditory perception and signal processing. Discrimination and recognition further require cognitive processes and supporting mechanisms in order to successfully identify and appropriately respond to signal senders. Although acoustic communication is common across birds, classical research has largely disregarded the perceptual abilities of perinatal altricial taxa. Chapter 1 reviews the literature of perinatal acoustic stimulation in birds, highlighting the disproportionate focus on precocial birds (e.g., chickens, ducks, quails). The long-held belief that altricial birds were incapable of acoustic perception in ovo was only recently overturned, as researchers began to find behavioral and physiological evidence …


Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan May 2023

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Identifying Key Activity Indicators In Rats' Neuronal Data Using Lasso Regularized Logistic Regression, Avery Woods May 2023

Identifying Key Activity Indicators In Rats' Neuronal Data Using Lasso Regularized Logistic Regression, Avery Woods

Honors Theses

This thesis aims to identify timestamps of rats’ neuronal activity that best determine behavior using a machine learning model. Neuronal data is a complex and high-dimensional dataset, and identifying the most informative features is crucial for understanding the underlying neuronal processes. The Lasso regularization technique is employed to select the most relevant features of the data to the model’s prediction. The results of this study provide insights into the key activity indicators that are associated with specific behaviors or cognitive processes in rats, as well as the effect that stress can have on neuronal activity and behavior. Ultimately, it was …


Reducing Restaurant Inventory Costs Through Sales Forecasting, Tyler Mason, Chris Schoen, Trevor Gilbert, Jonathan Enriquez Apr 2023

Reducing Restaurant Inventory Costs Through Sales Forecasting, Tyler Mason, Chris Schoen, Trevor Gilbert, Jonathan Enriquez

Senior Design Project For Engineers

Family Restaurant is a local restaurant in the greater Atlanta area that serves a variety of dishes that include an assortment of 19 different proteins. Currently, Family Restaurant places protein orders based on business intuition, and tends to over-stock and sometimes under-stock. To minimize inventory costs by reducing over-stocking and preventing under-stocking of proteins, we applied Facebook Prophet (FB Prophet), ARIMA, and XG Boost machine learning models to predict protein demand and then fed these results into a Fixed Time Period inventory model to make an overall order suggestion based on the specified time period. We trained our models on …


Multilevel Optimization With Dropout For Neural Networks, Gary Joseph Saavedra Apr 2023

Multilevel Optimization With Dropout For Neural Networks, Gary Joseph Saavedra

Mathematics & Statistics ETDs

Large neural networks have become ubiquitous in machine learning. Despite their widespread use, the optimization process for training a neural network remains com-putationally expensive and does not necessarily create networks that generalize well to unseen data. In addition, the difficulty of training increases as the size of the neural network grows. In this thesis, we introduce the novel MGDrop and SMGDrop algorithms which use a multigrid optimization scheme with a dropout coarsening operator to train neural networks. In contrast to other standard neural network training schemes, MGDrop explicitly utilizes information from smaller sub-networks which act as approximations of the full …


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …


Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty Jan 2023

Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty

Graduate Student Theses, Dissertations, & Professional Papers

Malware detection and vulnerability detection are important cybersecurity tasks. Previous research has successfully applied a variety of machine learning methods to both. However, despite their potential synergies, previous research has yet to unite these two tasks. Given the recent success of transfer learning in many domains, such as language modeling and image recognition, this thesis investigated the use of transfer learning to improve vulnerability detection. Specifically, we pre-trained a series of models to detect malicious binaries and used the weights from those models to kickstart the detection of vulnerable binaries. In our study, we also investigated five different data representations …


Eeg-Based Spanish Language Proficiency Classification: An Eeg Power Spectrum And Cross-Spectrum Analysis, Blaise Xavier O'Mara, Skyler Baumer Jan 2023

Eeg-Based Spanish Language Proficiency Classification: An Eeg Power Spectrum And Cross-Spectrum Analysis, Blaise Xavier O'Mara, Skyler Baumer

Honors Theses and Capstones

Second language proficiency may be predicted with electrophysiological techniques. In a machine learning application, this electrophysiological data may be used for language instructors and language students to assess their language learning. This study identifies how electroencephalogram (EEG) power spectrum and cross spectrum data of the brain cortex relates to Spanish second language (L2) proficiency of 20 Spanish language students of varying proficiency levels at the University of New Hampshire. The two metrics for assessing cortical power and processing were event-related desynchronization (ERD)—a measure of relative change in power—of the alpha (8-12 Hz) brain frequency band, and alpha and beta (13-30Hz) …


Machine Learning Model Comparison And Arma Simulation Of Exhaled Breath Signals Classifying Covid-19 Patients, Aaron Christopher Segura Aug 2022

Machine Learning Model Comparison And Arma Simulation Of Exhaled Breath Signals Classifying Covid-19 Patients, Aaron Christopher Segura

Mathematics & Statistics ETDs

This study compared the performance of machine learning models in classifying COVID-19 patients using exhaled breath signals and simulated datasets. Ground truth classification was determined by the gold standard Polymerase Chain Reaction (PCR) test results. A residual bootstrapped method generated the simulated datasets by fitting signal data to Autoregressive Moving Average (ARMA) models. Classification models included neural networks, k-nearest neighbors, naïve Bayes, random forest, and support vector machines. A Recursive Feature Elimination (RFE) study was performed to determine if reducing signal features would improve the classification models performance using Gini Importance scoring for the two classes. The top 25% of …


Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen Aug 2022

Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A major focus in statistics is building and improving computational algorithms that can use data to predict a response. Two fundamental camps of research arise from such a goal. The first camp is researching ways to get more accurate predictions. Many sophisticated methods, collectively known as machine learning methods, have been developed for this very purpose. One such method that is widely used across industry and many other areas of investigation is called Random Forests.

The second camp of research is that of improving the interpretability of machine learning methods. This is worthy of attention when analysts desire to optimize …


Stability And Differential Privacy Of Stochastic Gradient Methods, Zhenhuan Yang Aug 2022

Stability And Differential Privacy Of Stochastic Gradient Methods, Zhenhuan Yang

Legacy Theses & Dissertations (2009 - 2024)

Recently there are a considerable amount of work devoted to the study of the algorithmic stability as well as differential privacy (DP) for stochastic gradient methods (SGM). However, most of the existing work focus on the empirical risk minimization (ERM) and the population risk minimization problems. In this paper, we study two types of optimization problems that enjoy wide applications in modern machine learning, namely the minimax problem and the pairwise learning problem.


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti May 2022

Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti

Honors Thesis

Machine learning is often used to build predictive models by extracting patterns from large data sets. Such techniques are increasingly being utilized to predict outcomes in the social sciences. One such application is predicting student success. Machine learning can be applied to predicting student acceptance and success in academia. Using these tools for education-related data analysis, may enable the evaluation of programs, resources and curriculum. Currently, research is needed to examine application, admissions, and retention data in order to address equity in college computer science programs. However, most student-level data sets contain sensitive data that cannot be made public. To …


Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo May 2022

Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo

Senior Honors Papers / Undergraduate Theses

Supervised machine learning suffers from the ``garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation …


Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii May 2022

Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii

Undergraduate Honors Theses

Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …


Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu Apr 2022

Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu

Electronic Thesis and Dissertation Repository

Regulators’ early intervention is crucial when the financial system is experiencing difficulties. Financial stability must be preserved to avert banks’ bailouts, which hugely drain government's financial resources. Detecting in advance periods of financial crisis entails the development and customisation of accurate and robust quantitative techniques. The goal of this thesis is to construct automated systems via the interplay of various mathematical and statistical methodologies to signal financial instability episodes in the near-term horizon. These signal alerts could provide regulatory bodies with the capacity to initiate appropriate response that will thwart or at least minimise the occurrence of a financial crisis. …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Using Fine-Scale Aquatic Habitat Data To Construct Dreissenid Sdms In The Laurentian Great Lakes, Grace C. Henderson Mar 2022

Using Fine-Scale Aquatic Habitat Data To Construct Dreissenid Sdms In The Laurentian Great Lakes, Grace C. Henderson

USF Tampa Graduate Theses and Dissertations

The invasion of the Laurentian Great Lakes by aquatic invasive species (AIS) has been the subject of investigation for decades, due to their dramatic alterations to the ecosystem and high economic costs. Two AIS with the largest impacts are dreissenid zebra and quagga mussels, and though these species have been studied extensively, questions remain about what factors control their distributions, and whether lake warming will alter these distributions. Species distribution models (SDMs) offer a powerful tool to examine the relationship between species presences and environmental variables, which are typically bioclimactic data. The creation of the Aquatic Habitat (AqHab) dataset containing …


A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo Jan 2022

A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo

Theses, Dissertations and Capstones

Cyberattack is a never-ending war that has greatly threatened secured information systems. The development of automated and intelligent systems provides more computing power to hackers to steal information, destroy data or system resources, and has raised global security issues. Statistical and Data mining tools have received continuous research and improvements. These tools have been adopted to create sophisticated intrusion detection systems that help information systems mitigate and defend against cyberattacks. However, the advancement in technology and accessibility of information makes more identifiable elements that can be used to gain unauthorized access to systems and resources. Data mining and classification tools …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


Framework For The Evaluation Of Perturbations In The Systems Biology Landscape And Inter-Sample Similarity From Transcriptomic Datasets — A Digital Twin Perspective, Mariah Marie Hoffman Jan 2022

Framework For The Evaluation Of Perturbations In The Systems Biology Landscape And Inter-Sample Similarity From Transcriptomic Datasets — A Digital Twin Perspective, Mariah Marie Hoffman

Dissertations and Theses

One approach to interrogating the complexities of human systems in their well-regulated and dysregulated states is through the use of digital twins. Digital twins are virtual representations of physical systems that are descriptive of an individual's state of health, an object fundamentally related to precision medicine. A key element for building a functional digital twin type for a disease or predicting the therapeutic efficacy of a potential treatment is harmonized, machine-parsable domain knowledge. Hypothesis-driven investigations are the gold standard for representing subsystems, but their results encompass a limited knowledge of the full biosystem. Multi-omics data is one rich source of …


Exploring Cyberterrorism, Topic Models And Social Networks Of Jihadists Dark Web Forums: A Computational Social Science Approach, Vivian Fiona Guetler Jan 2022

Exploring Cyberterrorism, Topic Models And Social Networks Of Jihadists Dark Web Forums: A Computational Social Science Approach, Vivian Fiona Guetler

Graduate Theses, Dissertations, and Problem Reports

This three-article dissertation focuses on cyber-related topics on terrorist groups, specifically Jihadists’ use of technology, the application of natural language processing, and social networks in analyzing text data derived from terrorists' Dark Web forums. The first article explores cybercrime and cyberterrorism. As technology progresses, it facilitates new forms of behavior, including tech-related crimes known as cybercrime and cyberterrorism. In this article, I provide an analysis of the problems of cybercrime and cyberterrorism within the field of criminology by reviewing existing literature focusing on (a) the issues in defining terrorism, cybercrime, and cyberterrorism, (b) ways that cybercriminals commit a crime in …


A Non-Deterministic Deep Learning Based Surrogate For Ice Sheet Modeling, Hannah Jordan Jan 2022

A Non-Deterministic Deep Learning Based Surrogate For Ice Sheet Modeling, Hannah Jordan

Graduate Student Theses, Dissertations, & Professional Papers

Surrogate modeling is a new and expanding field in the world of deep learning, providing a computationally inexpensive way to approximate results from computationally demanding high-fidelity simulations. Ice sheet modeling is one of these computationally expensive models, the model used in this study currently requires between 10 and 20 minutes to complete one simulation. While this process is adequate for certain applications, the ability to use sampling approaches to perform statistical inference becomes infeasible. This issue can be overcome by using a surrogate model to approximate the ice sheet model, bringing the time to produce output down to a tenth …


Parameter Estimation And Inference Of Spatial Autoregressive Model By Stochastic Gradient Descent, Gan Luan Dec 2021

Parameter Estimation And Inference Of Spatial Autoregressive Model By Stochastic Gradient Descent, Gan Luan

Dissertations

Stochastic gradient descent (SGD) is a popular iterative method for model parameter estimation in large-scale data and online learning settings since it goes through the data in only one pass. While SGD has been well studied for independent data, its application to spatially-correlated data largely remains unexplored. This dissertation develops SGD-based parameter estimation and statistical inference algorithms for the spatial autoregressive (SAR) model, a common model for spatial lattice data.

This research contains three parts. (I) The first part concerns SGD estimation and inference for the SAR mean regression model. A new SGD algorithm based on maximum likelihood estimator (MLE) …


Factors Influencing Intent To Take A Covid-19 Test In The United States, Sheila Rutto Dec 2021

Factors Influencing Intent To Take A Covid-19 Test In The United States, Sheila Rutto

Theses and Dissertations

In 2020, COVID-19 became the first pandemic in the world’s history that brought the entire world to an abrupt and unexpected halt. Since the first reported case of the disease to date, the novel coronavirus has been able to wreak havoc in literary every corner of the globe and left an ever-growing number of unprecedented fatalities. The normal way of life has been disrupted, and the level of uncertainty about the end of this pandemic continues to manifest to many. Due to the urgency to bring this pandemic under control, medical officers have been able to recommend actions that people …


High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki Oct 2021

High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki

Doctoral Dissertations

Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants …


Characterizing Convolutional Neural Network Early-Learning And Accelerating Non-Adaptive, First-Order Methods With Localized Lagrangian Restricted Memory Level Bundling, Benjamin O. Morris Sep 2021

Characterizing Convolutional Neural Network Early-Learning And Accelerating Non-Adaptive, First-Order Methods With Localized Lagrangian Restricted Memory Level Bundling, Benjamin O. Morris

Theses and Dissertations

This dissertation studies the underlying optimization problem encountered during the early-learning stages of convolutional neural networks and introduces a training algorithm competitive with existing state-of-the-art methods. First, a Design of Experiments method is introduced to systematically measure empirical second-order Lipschitz upper bound and region size estimates for local regions of convolutional neural network loss surfaces experienced during the early-learning stages. This method demonstrates that architecture choices can significantly impact the local loss surfaces traversed during training. Next, a Design of Experiments method is used to study the effects convolutional neural network architecture hyperparameters have on different optimization routines' abilities to …


Novel Statistical Modeling Methods For Traffic Video Analysis, Hang Shi Aug 2021

Novel Statistical Modeling Methods For Traffic Video Analysis, Hang Shi

Dissertations

Video analysis is an active and rapidly expanding research area in computer vision and artificial intelligence due to its broad applications in modern society. Many methods have been proposed to analyze the videos, but many challenging factors remain untackled. In this dissertation, four statistical modeling methods are proposed to address some challenging traffic video analysis problems under adverse illumination and weather conditions.

First, a new foreground detection method is presented to detect the foreground objects in videos. A novel Global Foreground Modeling (GFM) method, which estimates a global probability density function for the foreground and applies the Bayes decision rule …


Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu Aug 2021

Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu

Electronic Theses and Dissertations

The Newsvendor problem is a classical supply chain problem used to develop strategies for inventory optimization. The goal of the newsvendor problem is to predict the optimal order quantity of a product to meet an uncertain demand in the future, given that the demand distribution itself is known. The Ice Cream Vendor Problem extends the classical newsvendor problem to an uncertain demand with unknown distribution, albeit a distribution that is known to depend on exogenous features. The goal is thus to estimate the order quantity that minimizes the total cost when demand does not follow any known statistical distribution. The …