Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Machine learning

Theses/Dissertations

Applied Mathematics

Institution
Publication Year
Publication

Articles 1 - 30 of 35

Full-Text Articles in Physical Sciences and Mathematics

Tools For Biomolecular Modeling And Simulation, Xin Yang Apr 2024

Tools For Biomolecular Modeling And Simulation, Xin Yang

Mathematics Theses and Dissertations

Electrostatic interactions play a pivotal role in understanding biomolecular systems, influencing their structural stability and functional dynamics. The Poisson-Boltzmann (PB) equation, a prevalent implicit solvent model that treats the solvent as a continuum while describes the mobile ions using the Boltzmann distribution, has become a standard tool for detailed investigations into biomolecular electrostatics. There are two primary methodologies: grid-based finite difference or finite element methods and body-fitted boundary element methods. This dissertation focuses on developing fast and accurate PB solvers, leveraging both methodologies, to meet diverse scientific needs and overcome various obstacles in the field.


Predicting Biomolecular Properties And Interactions Using Numerical, Statistical And Machine Learning Methods, Elyssa Sliheet Apr 2024

Predicting Biomolecular Properties And Interactions Using Numerical, Statistical And Machine Learning Methods, Elyssa Sliheet

Mathematics Theses and Dissertations

We investigate machine learning and electrostatic methods to predict biophysical properties of proteins, such as solvation energy and protein ligand binding affinity, for the purpose of drug discovery/development. We focus on the Poisson-Boltzmann model and various high performance computing considerations such as parallelization schemes.


Bringing Gans To Medieval Times: Manuscript Translation Models, Tonilynn M. Holtz Jan 2024

Bringing Gans To Medieval Times: Manuscript Translation Models, Tonilynn M. Holtz

Electronic Theses and Dissertations

The Generative Adversarial Networks (GAN) recently emerged as a powerful framework for producing new knowledge from existing knowledge. These models aim to learn patterns from input data then use that knowledge to generate output data samples that plausibly appear to belong to the same set as the input data. Medieval manuscripts study has been an important research area in the humanities field for many decades. These rare manuscripts are often times inaccessible to the general public, including students in scholars, and it is of a great interest to provide digital support (including, but not limited to translation and search) for …


Data-Driven Exploration Of Coarse-Grained Equations: Harnessing Machine Learning, Elham Kianiharchegani Aug 2023

Data-Driven Exploration Of Coarse-Grained Equations: Harnessing Machine Learning, Elham Kianiharchegani

Electronic Thesis and Dissertation Repository

In scientific research, understanding and modeling physical systems often involves working with complex equations called Partial Differential Equations (PDEs). These equations are essential for describing the relationships between variables and their derivatives, allowing us to analyze a wide range of phenomena, from fluid dynamics to quantum mechanics. Traditionally, the discovery of PDEs relied on mathematical derivations and expert knowledge. However, the advent of data-driven approaches and machine learning (ML) techniques has transformed this process. By harnessing ML techniques and data analysis methods, data-driven approaches have revolutionized the task of uncovering complex equations that describe physical systems. The primary goal in …


Mathematics Behind Machine Learning, Rim Hammoud Aug 2023

Mathematics Behind Machine Learning, Rim Hammoud

Electronic Theses, Projects, and Dissertations

Artificial intelligence (AI) is a broad field of study that involves developing intelligent
machines that can perform tasks that typically require human intelligence. Machine
learning (ML) is often used as a tool to help create AI systems. The goal of ML is
to create models that can learn and improve to make predictions or decisions based on given data. The goal of this thesis is to build a clear and rigorous exposition of the mathematical underpinnings of support vector machines (SVM), a popular platform used in ML. As we will explore later on in the thesis, SVM can be implemented …


Learning The Game: Implementations Of Convolutional Networks In Automated Strategy Identification, Cameron Klig Jun 2023

Learning The Game: Implementations Of Convolutional Networks In Automated Strategy Identification, Cameron Klig

Master's Theses

Games can be used to represent a wide variety of real world problems, giving rise to many applications of game theory. Various computational methods have been proposed for identifying game strategies, including optimized tree search algorithms, game-specific heuristics, and artificial intelligence. In the last decade, systems like AlphaGo and AlphaZero have significantly exceeded the performance of the best human players in Chess, Go, and other games. The most effective game engines to date employ convolutional neural networks (CNNs) to evaluate game boards, extract features, and predict the optimal next move. These engines are trained on billions of simulated games, wherein …


Continuum Modeling Of Active Nematics Via Data-Driven Equation Discovery, Connor Robertson May 2023

Continuum Modeling Of Active Nematics Via Data-Driven Equation Discovery, Connor Robertson

Dissertations

Data-driven modeling seeks to extract a parsimonious model for a physical system directly from measurement data. One of the most interpretable of these methods is Sparse Identification of Nonlinear Dynamics (SINDy), which selects a relatively sparse linear combination of model terms from a large set of (possibly nonlinear) candidates via optimization. This technique has shown promise for synthetic data generated by numerical simulations but the application of the techniques to real data is less developed. This dissertation applies SINDy to video data from a bio-inspired system of mictrotubule-motor protein assemblies, an example of nonequilibrium dynamics that has posed a significant …


Multilevel Optimization With Dropout For Neural Networks, Gary Joseph Saavedra Apr 2023

Multilevel Optimization With Dropout For Neural Networks, Gary Joseph Saavedra

Mathematics & Statistics ETDs

Large neural networks have become ubiquitous in machine learning. Despite their widespread use, the optimization process for training a neural network remains com-putationally expensive and does not necessarily create networks that generalize well to unseen data. In addition, the difficulty of training increases as the size of the neural network grows. In this thesis, we introduce the novel MGDrop and SMGDrop algorithms which use a multigrid optimization scheme with a dropout coarsening operator to train neural networks. In contrast to other standard neural network training schemes, MGDrop explicitly utilizes information from smaller sub-networks which act as approximations of the full …


Graph-Based Acoustic Clustering And Classification, Justin Youngho Sunu Jan 2023

Graph-Based Acoustic Clustering And Classification, Justin Youngho Sunu

CGU Theses & Dissertations

The rapid growth of audio data collection in various domains necessitates advanced techniquesfor efficient analysis and classification. This dissertation proposes new approaches for categorizing acoustic data, using both unsupervised and semi-supervised learning methods. Starting with raw audio, we preprocess the signal to segment it into time windows, each of which we consider as an independent data point. We use the short-time Fourier transform to describe the signal in a given time window as a set of Fourier coefficients. We interpret the resulting frequency signature as a high-dimensional feature description of each data point. We then develop a graph-based approach for …


Leveraging Subject Matter Expertise To Optimize Machine Learning Techniques For Air And Space Applications, Philip Y. Cho Sep 2022

Leveraging Subject Matter Expertise To Optimize Machine Learning Techniques For Air And Space Applications, Philip Y. Cho

Theses and Dissertations

We develop new machine learning and statistical methods that are tailored for Air and Space applications through the incorporation of subject matter expertise. In particular, we focus on three separate research thrusts that each represents a different type of subject matter knowledge, modeling approach, and application. In our first thrust, we incorporate knowledge of natural phenomena to design a neural network algorithm for localizing point defects in transmission electron microscopy (TEM) images of crystalline materials. In our second research thrust, we use Bayesian feature selection and regression to analyze the relationship between fighter pilot attributes and flight mishap rates. We …


Machine Learning Model Comparison And Arma Simulation Of Exhaled Breath Signals Classifying Covid-19 Patients, Aaron Christopher Segura Aug 2022

Machine Learning Model Comparison And Arma Simulation Of Exhaled Breath Signals Classifying Covid-19 Patients, Aaron Christopher Segura

Mathematics & Statistics ETDs

This study compared the performance of machine learning models in classifying COVID-19 patients using exhaled breath signals and simulated datasets. Ground truth classification was determined by the gold standard Polymerase Chain Reaction (PCR) test results. A residual bootstrapped method generated the simulated datasets by fitting signal data to Autoregressive Moving Average (ARMA) models. Classification models included neural networks, k-nearest neighbors, naïve Bayes, random forest, and support vector machines. A Recursive Feature Elimination (RFE) study was performed to determine if reducing signal features would improve the classification models performance using Gini Importance scoring for the two classes. The top 25% of …


Stability And Differential Privacy Of Stochastic Gradient Methods, Zhenhuan Yang Aug 2022

Stability And Differential Privacy Of Stochastic Gradient Methods, Zhenhuan Yang

Legacy Theses & Dissertations (2009 - 2024)

Recently there are a considerable amount of work devoted to the study of the algorithmic stability as well as differential privacy (DP) for stochastic gradient methods (SGM). However, most of the existing work focus on the empirical risk minimization (ERM) and the population risk minimization problems. In this paper, we study two types of optimization problems that enjoy wide applications in modern machine learning, namely the minimax problem and the pairwise learning problem.


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Fine-Tuning A 𝑘-Nearest Neighbors Machine Learning Model For The Detection Of Insurance Fraud, Alliyah Stout Jun 2022

Fine-Tuning A 𝑘-Nearest Neighbors Machine Learning Model For The Detection Of Insurance Fraud, Alliyah Stout

Honors Theses

Billions of dollars are lost within insurance companies due to fraud. Large money losses force insurance companies to increase premium costs and/or restrict policies. This negatively affects a company’s loyal customers. Although this is a prevalent problem, companies are not urgently working toward bettering their machine learning algorithms. Underskilled workers paired with inefficient computer algorithms make it difficult to accurately and reliably detect fraud.

The goal of this study is to understand the idea of -Nearest Neighbors ( -NN) and to use this classification technique to accurately detect fraudulent auto insurance claims. Using -NN requires choosing a value and a …


Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti May 2022

Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti

Honors Thesis

Machine learning is often used to build predictive models by extracting patterns from large data sets. Such techniques are increasingly being utilized to predict outcomes in the social sciences. One such application is predicting student success. Machine learning can be applied to predicting student acceptance and success in academia. Using these tools for education-related data analysis, may enable the evaluation of programs, resources and curriculum. Currently, research is needed to examine application, admissions, and retention data in order to address equity in college computer science programs. However, most student-level data sets contain sensitive data that cannot be made public. To …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


From Mdp To Alphazero, David Robert Sewell Nov 2021

From Mdp To Alphazero, David Robert Sewell

Dissertations and Theses

In this paper I will explain the AlphaGo family of algorithms starting from first principles and requiring little previous knowledge from the reader. The focus will be upon one of the more recent versions AlphaZero but I hope to explain the core principles that allowed these algorithms to be so successful. I will generally refer to AlphaZero as theses [sic] core set of principles and will make it clear when I am referring to a specific algorithm of the AlphaGo family. AlphaZero in short combines Monte Carlo Tree Search (MCTS) with Deep learning and self-play. We will see how these …


Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu Aug 2021

Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu

Electronic Theses and Dissertations

The Newsvendor problem is a classical supply chain problem used to develop strategies for inventory optimization. The goal of the newsvendor problem is to predict the optimal order quantity of a product to meet an uncertain demand in the future, given that the demand distribution itself is known. The Ice Cream Vendor Problem extends the classical newsvendor problem to an uncertain demand with unknown distribution, albeit a distribution that is known to depend on exogenous features. The goal is thus to estimate the order quantity that minimizes the total cost when demand does not follow any known statistical distribution. The …


Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil May 2021

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil

Open Access Theses & Dissertations

With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first …


Implementing A Neural Network For Supervised Learning With A Random Configuration Of Layers And Nodes, Kane A. Phillips Jan 2021

Implementing A Neural Network For Supervised Learning With A Random Configuration Of Layers And Nodes, Kane A. Phillips

Electronic Theses and Dissertations

Deep learning has a substantial amount of real-life applications, making it an increasingly popular subset of artificial intelligence over the last decade. These applications come to fruition due to the tireless research and implementation of neural networks. This paper goes into detail on the implementation of supervised learning neural networks utilizing MATLAB, with the purpose being to generate a neural network based on specifications given by a user. Such specifications involve how many layers are in the network, and how many nodes are in each layer. The neural network is then trained based on known sample values of a function …


Developing Natural Language Processing Instruments To Study Sociotechnical Systems, Thayer Alshaabi Jan 2021

Developing Natural Language Processing Instruments To Study Sociotechnical Systems, Thayer Alshaabi

Graduate College Dissertations and Theses

Identifying temporal linguistic patterns and tracing social amplification across communities has always been vital to understanding modern sociotechnical systems. Now, well into the age of information technology, the growing digitization of text archives powered by machine learning systems has enabled an enormous number of interdisciplinary studies to examine the coevolution of language and culture. However, most research in that domain investigates formal textual records, such as books and newspapers. In this work, I argue that the study of conversational text derived from social media is just as important. I present four case studies to identify and investigate societal developments in …


Inference Of Surface Velocities From Oblique Time Lapse Photos And Terrestrial Based Lidar At The Helheim Glacier, Franklyn T. Dunbar Ii Jan 2021

Inference Of Surface Velocities From Oblique Time Lapse Photos And Terrestrial Based Lidar At The Helheim Glacier, Franklyn T. Dunbar Ii

Graduate Student Theses, Dissertations, & Professional Papers

Using time dependent observations derived from terrestrial LiDAR and oblique
time-lapse imagery, we demonstrate that a Bayesian approach to glacial motion es-
timation provides a concise way to incorporate multiple data products into a single
motion estimation procedure effectively producing surface velocity estimates with
an associated uncertainty. This approach brings both improved computational effi-
ciency, and greater scalability across observational time-frames when compared to
existing methods. To gauge efficacy, we apply these methods to a set of observa-
tions from the Helheim Glacier, a critical actor in contemporary mass loss trends
observed in the Greenland Ice Sheet. We find that …


Exploring The Potential Of Sparse Coding For Machine Learning, Sheng Yang Lundquist Oct 2020

Exploring The Potential Of Sparse Coding For Machine Learning, Sheng Yang Lundquist

Dissertations and Theses

While deep learning has proven to be successful for various tasks in the field of computer vision, there are several limitations of deep-learning models when compared to human performance. Specifically, human vision is largely robust to noise and distortions, whereas deep learning performance tends to be brittle to modifications of test images, including being susceptible to adversarial examples. Additionally, deep-learning methods typically require very large collections of training examples for good performance on a task, whereas humans can learn to perform the same task with a much smaller number of training examples.

In this dissertation, I investigate whether the use …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


Comparing Predictive Performance Of Statistical Learning Models On Medical Data, Francis Biney Jan 2020

Comparing Predictive Performance Of Statistical Learning Models On Medical Data, Francis Biney

Open Access Theses & Dissertations

This work investigates the predictive performance of 10 Machine learning models on three medical data including Breast cancer, Heart disease and Prostate cancer. Furthermore, we use the models to identify risk factors that contribute significantly to these diseases.

The models considered include; Logistic regression with L1 and L_2 penalties, Principal component logistic regression(PCR-LR), Partial least squares logistic regression(PLS-LR), Multivariate adaptive regression splines(MARS), Support vector machine with Radial Basis Kernel (SVM-RBK), Random Forest(RF), Gradient Boosting Machines(GBM), Elastic Net (Enet) and Feedforward Neural Network(FFNN). The models were grouped according to their similarities and learning style; i) Linear regularized models: LR-Lasso, LR-Ridge and …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Forecasting Crashes, Credit Card Default, And Imputation Analysis On Missing Values By The Use Of Neural Networks, Jazmin Quezada Jan 2019

Forecasting Crashes, Credit Card Default, And Imputation Analysis On Missing Values By The Use Of Neural Networks, Jazmin Quezada

Open Access Theses & Dissertations

A neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain. Neural networks,- also called Artificial Neural Networks - are a variety of deep learning technology, which also falls under the umbrella of artificial intelligence, or AI. Recent studies shows that Artificial Neural Network has the highest coefficient of determination (i.e. measure to assess how well a model explains and predicts future outcomes.) in comparison to the K-nearest neighbor classifiers, logistic regression, discriminant analysis, naive Bayesian classifier, and classification trees. In this work, the theoretical description of the neural network methodology …


Optimization Methods For Learning Graph-Structured Sparse Models, Baojian Zhou Jan 2019

Optimization Methods For Learning Graph-Structured Sparse Models, Baojian Zhou

Legacy Theses & Dissertations (2009 - 2024)

Learning graph-structured sparse models has recently received significant attention thanks to their broad applicability to many important real-world problems. However, such models, of more effective and stronger interpretability compared with their counterparts, are difficult to learn due to optimization challenges. This thesis presents optimization algorithms for learning graph-structured sparse models under three different problem settings. Firstly, under the batch learning setting, we develop methods that can be applied to different objective functions that enjoy linear convergence guarantees up to constant errors. They can effectively optimize the statistical score functions in the task of subgraph detection; Secondly, under stochastic learning setting, …


Recurrent Neural Networks And Their Applications To Rna Secondary Structure Inference, Devin Willmott Jan 2018

Recurrent Neural Networks And Their Applications To Rna Secondary Structure Inference, Devin Willmott

Theses and Dissertations--Mathematics

Recurrent neural networks (RNNs) are state of the art sequential machine learning tools, but have difficulty learning sequences with long-range dependencies due to the exponential growth or decay of gradients backpropagated through the RNN. Some methods overcome this problem by modifying the standard RNN architecure to force the recurrent weight matrix W to remain orthogonal throughout training. The first half of this thesis presents a novel orthogonal RNN architecture that enforces orthogonality of W by parametrizing with a skew-symmetric matrix via the Cayley transform. We present rules for backpropagation through the Cayley transform, show how to deal with the Cayley …


Temporal Feature Selection With Symbolic Regression, Christopher Winter Fusting Jan 2017

Temporal Feature Selection With Symbolic Regression, Christopher Winter Fusting

Graduate College Dissertations and Theses

Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal'' that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite …