Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine Learning

Theses/Dissertations

Mathematics

Articles 1 - 30 of 31

Full-Text Articles in Physical Sciences and Mathematics

Integrating Machine Learning Methods For Medical Diagnosis, Jazmin Quezada Dec 2023

Integrating Machine Learning Methods For Medical Diagnosis, Jazmin Quezada

Open Access Theses & Dissertations

Abstract:The rapid advancement of machine learning techniques has revolutionized the field of medical diagnosis by offering powerful tools to analyze complex data sets and make accurate predictions. In this proposed method, we present a novel approach that integrates machine learning and optimization models to enhance the accuracy of medical diagnoses. Our method focuses on fine-tuning and optimizing the parameters of machine learning algorithms commonly used in medical diagnosis, such as logistic regression, support vector machines, and neural networks. By employing optimization techniques, we systematically explore the parameter space of these algorithms to discover the most optimal configurations. Moreover, by representing …


Secondary Features Of Importance For A Url Ranking, Atajan Abdyyev Aug 2023

Secondary Features Of Importance For A Url Ranking, Atajan Abdyyev

Dissertations and Theses

This paper investigates the impact of secondary ranking factors on webpage relevance and rankings in the context of Search Engine Optimization (SEO), focusing on the jewelry domain within the United States e-commerce market. By generating a keyword list related to jewelry and retrieving top URLs from Google's search results, the study employs machine learning models including XGBoost, CatBoost, and Linear Regression to identify key features influencing webpage relevance and rankings.The findings highlight specific optimal ranges for features like Outlinks, Unique Inlinks, Flesch Reading Ease Score, and others, indicating their significant impact on better rankings. Notably, Random Forest model performed best …


Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock Aug 2023

Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many discipline specific researchers need a way to quickly compare the accuracy of their predictive models to other alternatives. However, many of these researchers are not experienced with multiple programming languages. Python has recently been the leader in machine learning functionality, which includes the PyCaret library that allows users to develop high-performing machine learning models with only a few lines of code. The goal of the stressor package is to help users of the R programming language access the advantages of PyCaret without having to learn Python. This allows the user to leverage R’s powerful data analysis workflows, while simultaneously …


An Interval-Valued Random Forests, Paul Gaona Partida Aug 2023

An Interval-Valued Random Forests, Paul Gaona Partida

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

There is a growing demand for the development of new statistical models and the refinement of established methods to accommodate different data structures. This need arises from the recognition that traditional statistics often assume the value of each observation to be precise, which may not hold true in many real-world scenarios. Factors such as the collection process and technological advancements can introduce imprecision and uncertainty into the data.

For example, consider data collected over a long period of time, where newer measurement tools may offer greater accuracy and provide more information than previous methods. In such cases, it becomes crucial …


Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas May 2023

Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this …


Normalization Techniques For Sequential And Graphical Data, Cole Pospisil Jan 2023

Normalization Techniques For Sequential And Graphical Data, Cole Pospisil

Theses and Dissertations--Mathematics

Normalization methods have proven to be an invaluable tool in the training of deep neural networks. In particular, Layer and Batch Normalization are commonly used to mitigate the risks of exploding and vanishing gradients. This work presents two methods which are related to these normalization techniques. The first method is Batch Normalized Preconditioning (BNP) for recurrent neural networks (RNN) and graph convolutional networks (GCN). BNP has been suggested as a technique for Fully Connected and Convolutional networks for achieving similar performance benefits to Batch Normalization by controlling the condition number of the Hessian through preconditioning on the gradients. We extend …


Machine Learning To Predict Warhead Fragmentation In-Flight Behavior From Static Data, Katharine Larsen Oct 2022

Machine Learning To Predict Warhead Fragmentation In-Flight Behavior From Static Data, Katharine Larsen

Doctoral Dissertations and Master's Theses

Accurate characterization of fragment fly-out properties from high-speed warhead detonations is essential for estimation of collateral damage and lethality for a given weapon. Real warhead dynamic detonation tests are rare, costly, and often unrealizable with current technology, leaving fragmentation experiments limited to static arena tests and numerical simulations. Stereoscopic imaging techniques can now provide static arena tests with time-dependent tracks of individual fragments, each with characteristics such as fragment IDs and their respective position vector. Simulation methods can account for the dynamic case but can exclude relevant dynamics experienced in real-life warhead detonations. This research leverages machine learning methodologies to …


Development Of Graphical Models And Statistical Physics Motivated Approaches To Genomic Investigations, Yashwanth Lagisetty Aug 2022

Development Of Graphical Models And Statistical Physics Motivated Approaches To Genomic Investigations, Yashwanth Lagisetty

Dissertations & Theses (Open Access)

Identifying genes involved in disease pathology has been a goal of genomic research since the early days of the field. However, as technology improves and the body of research grows, we are faced with more questions than answers. Among these is the pressing matter of our incomplete understanding of the genetic underpinnings of complex diseases. Many hypotheses offer explanations as to why direct and independent analyses of variants, as done in genome-wide association studies (GWAS), may not fully elucidate disease genetics. These range from pointing out flaws in statistical testing to invoking the complex dynamics of epigenetic processes. In the …


A Study Of Machine Learning Techniques For Dynamical System Prediction, Rishi Pawar May 2022

A Study Of Machine Learning Techniques For Dynamical System Prediction, Rishi Pawar

Theses and Dissertations

Dynamical Systems are ubiquitous in mathematics and science and have been used to model many important application problems such as population dynamics, fluid flow, and control systems. However, some of them are challenging to construct from the traditional mathematical techniques. To combat such problems, various machine learning techniques exist that attempt to use collected data to form predictions that can approximate the dynamical system of interest. This thesis will study some basic machine learning techniques for predicting system dynamics from the data generated by test systems. In particular, the methods of Dynamic Mode Decomposition (DMD), Sparse Identification of Nonlinear Dynamics …


Analyzing Suicidal Text Using Natural Language Processing, Cassandra Barton May 2022

Analyzing Suicidal Text Using Natural Language Processing, Cassandra Barton

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Using Natural Language Processing (NLP), we are able to analyze text from suicidal individuals. This can be done using a variety of methods. I analyzed a dataset of a girl named Victoria that died by suicide. I used a machine learning method to train a different dataset and tested it on her diary entries to classify her text into two categories: suicidal vs non-suicidal. I used topic modeling to find out unique topics in each subset. I also found a pattern in her diary entries. NLP allows us to help individuals that are suicidal and their family members and close …


Quadratic Neural Network Architecture As Evaluated Relative To Conventional Neural Network Architecture, Reid Taylor Apr 2022

Quadratic Neural Network Architecture As Evaluated Relative To Conventional Neural Network Architecture, Reid Taylor

Senior Theses

Current work in the field of deep learning and neural networks revolves around several variations of the same mathematical model for associative learning. These variations, while significant and exceptionally applicable in the real world, fail to push the limits of modern computational prowess. This research does just that: by leveraging high order tensors in place of 2nd order tensors, quadratic neural networks can be developed and can allow for substantially more complex machine learning models which allow for self-interactions of collected and analyzed data. This research shows the theorization and development of mathematical model necessary for such an idea to …


Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange Jan 2022

Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange

Theses and Dissertations--Mathematics

Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this work, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss …


Reinforcement Learning: Low Discrepancy Action Selection For Continuous States And Actions, Jedidiah Lindborg Jan 2022

Reinforcement Learning: Low Discrepancy Action Selection For Continuous States And Actions, Jedidiah Lindborg

Electronic Theses and Dissertations

In reinforcement learning the process of selecting an action during the exploration or exploitation stage is difficult to optimize. The purpose of this thesis is to create an action selection process for an agent by employing a low discrepancy action selection (LDAS) method. This should allow the agent to quickly determine the utility of its actions by prioritizing actions that are dissimilar to ones that it has already picked. In this way the learning process should be faster for the agent and result in more optimal policies.


Measuring Machine Learning Model Uncertainty With Applications To Aerial Segmentation, Kevin James Cotton Jan 2021

Measuring Machine Learning Model Uncertainty With Applications To Aerial Segmentation, Kevin James Cotton

CGU Theses & Dissertations

Machine learning model performance on both validation data and new data can be better measured and understood by leveraging uncertainty metrics at the time of prediction. These metrics can improve the model training process by indicating which training data need to be corrected and what part of the domain needs further annotation. The methods described have yet to reach mainstream adoption, and show great potential. Here, we survey the field of uncertainty metrics and provide a robust framework for its application to aerial segmentation. Uncertainty is divided into two types: aleatoric and epistemic. Aleatoric uncertainty arises from variations in training …


Dictionary-Based Data Generation For Fine-Tuning Bert For Adverbial Paraphrasing Tasks, Mark Anthony Carthon Aug 2020

Dictionary-Based Data Generation For Fine-Tuning Bert For Adverbial Paraphrasing Tasks, Mark Anthony Carthon

Theses and Dissertations

Recent advances in natural language processing technology have led to the emergence of

large and deep pre-trained neural networks. The use and focus of these networks are on transfer

learning. More specifically, retraining or fine-tuning such pre-trained networks to achieve state

of the art performance in a variety of challenging natural language processing/understanding

(NLP/NLU) tasks. In this thesis, we focus on identifying paraphrases at the sentence level using

the network Bidirectional Encoder Representations from Transformers (BERT). It is well

understood that in deep learning the volume and quality of training data is a determining factor

of performance. The objective of …


An Improved Method For Spectroscopic Quality Classification, Elizabeth G. Mayer Jul 2020

An Improved Method For Spectroscopic Quality Classification, Elizabeth G. Mayer

Mathematics & Statistics ETDs

Spectral quality classification is a vital step in data cleaning before the

analysis of magnetic resonance spectroscopy (MRS) data can be done. This

analysis compares five methods of quality classification; three of these are

legacy methods, Maudsley et al. (2006), Zhang et al. (2018), and

Bustillo et al. (2020), and two newly created methods that used a random forests

classifier (RFC) to inform their classifications. We found that the random forest

classifier was the most accurate at predicting spectra quality (balanced

accuracy for RF of 88% vs legacy of 70%, 72%, or 72%). A

Random-Forests-Informed Filtering method (RFIFM) for quality …


A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley Jul 2020

A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley

Theses and Dissertations

According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …


Artificial Neural Network Models For Pattern Discovery From Ecg Time Series, Mehakpreet Kaur Jan 2020

Artificial Neural Network Models For Pattern Discovery From Ecg Time Series, Mehakpreet Kaur

Electronic Theses and Dissertations

Artificial Neural Network (ANN) models have recently become de facto models for deep learning with a wide range of applications spanning from scientific fields such as computer vision, physics, biology, medicine to social life (suggesting preferred movies, shopping lists, etc.). Due to advancements in computer technology and the increased practice of Artificial Intelligence (AI) in medicine and biological research, ANNs have been extensively applied not only to provide quick information about diseases, but also to make diagnostics accurate and cost-effective. We propose an ANN-based model to analyze a patient's electrocardiogram (ECG) data and produce accurate diagnostics regarding possible heart diseases …


Predicting Absenteeism Of Female Students In Alabama, Funmilola Okelana Aug 2019

Predicting Absenteeism Of Female Students In Alabama, Funmilola Okelana

Dissertations and Theses

Abstract

Students are chronically absent when they miss at least 15 days of the school year. Past researchers have identified income and environment as factors that affect school absenteeism. Alabama is a poor state with a high crime rate. The hypothesis for this research is that the absenteeism of female students in Alabama is high. Do we reject or fail to reject this hypothesis. If we fail to reject this hypothesis, then what other factors can affect absenteeism in schools? How can we best predict the absenteeism of female students in Alabama? What is the effect of bad data on …


Machine Learning Techniques As Applied To Discrete And Combinatorial Structures, Samuel David Schwartz Aug 2019

Machine Learning Techniques As Applied To Discrete And Combinatorial Structures, Samuel David Schwartz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Machine Learning Techniques have been used on a wide array of input types: images, sound waves, text, and so forth. In articulating these input types to the almighty machine, there have been all sorts of amazing problems that have been solved for many practical purposes.

Nevertheless, there are some input types which don’t lend themselves nicely to the standard set of machine learning tools we have. Moreover, there are some provably difficult problems which are abysmally hard to solve within a reasonable time frame.

This thesis addresses several of these difficult problems. It frames these problems such that we can …


Model-Independent Estimation Of Optimal Hedging Strategies With Deep Neural Networks, Tobias Michael Furtwaengler May 2019

Model-Independent Estimation Of Optimal Hedging Strategies With Deep Neural Networks, Tobias Michael Furtwaengler

Theses and Dissertations

Inspired by the recent paper Buehler et al. (2018), this thesis aims to investigate the optimal hedging and pricing of financial derivatives with neural networks. We utilize the concept of convex risk measures to define optimal hedging strategies without strong assumptions on the underlying market dynamics. Furthermore, the setting allows the incorporation of market frictions and thus the determination of optimal hedging strategies and prices even in incomplete markets. We then use the approximation capabilities of neural networks to find close-to optimal estimates for these strategies.

We will elaborate on the theoretical foundations of this approach and carry out implementations …


Model-Independent Estimation Of Optimal Hedging Strategies With Deep Neural Networks, Tobias Michael Furtwaengler May 2019

Model-Independent Estimation Of Optimal Hedging Strategies With Deep Neural Networks, Tobias Michael Furtwaengler

Theses and Dissertations

Inspired by the recent paper Buehler et al. (2018), this thesis aims to investigate the optimal hedging and pricing of financial derivatives with neural networks. We utilize the concept of convex risk measures to define optimal hedging strategies without strong assumptions on the underlying market dynamics. Furthermore, the setting allows the incorporation of market frictions and thus the determination of optimal hedging strategies and prices even in incomplete markets. We then use the approximation capabilities of neural networks to find close-to optimal estimates for these strategies.

We will elaborate on the theoretical foundations of this approach and carry out implementations …


Lattice Simplices: Sufficiently Complicated, Brian Davis Jan 2019

Lattice Simplices: Sufficiently Complicated, Brian Davis

Theses and Dissertations--Mathematics

Simplices are the "simplest" examples of polytopes, and yet they exhibit much of the rich and subtle combinatorics and commutative algebra of their more general cousins. In this way they are sufficiently complicated --- insights gained from their study can inform broader research in Ehrhart theory and associated fields.

In this dissertation we consider two previously unstudied properties of lattice simplices; one algebraic and one combinatorial. The first is the Poincar\'e series of the associated semigroup algebra, which is substantially more complicated than the Hilbert series of that same algebra. The second is the partial ordering of the elements of …


A Dual State Hierarchical Ensemble Kalman Filter Algorithm, William J. Cook, Jesse Johnson, Marko Maneta, Doug Brinkerhoff Jan 2019

A Dual State Hierarchical Ensemble Kalman Filter Algorithm, William J. Cook, Jesse Johnson, Marko Maneta, Doug Brinkerhoff

Graduate Student Theses, Dissertations, & Professional Papers

Dynamic models that simulate processes across large geographic locations, such as hydrologic models, are often informed by empirical parameters that are distributed across a geographical area and segmented by geological features such as watersheds. These parameters may be referred to as spatially distributed parameters. Spatially distributed parameters are frequently spatially correlated and any techniques utilized in their calibration ideally incorporate existing spatial hierarchical relationships into their structure. In this paper, a parameter estimation method based on the Dual State Ensemble Kalman Filter called the Dual State Hierarchical Ensemble Kalman Filter (DSHEnKF) is presented. This modified filter is innovative in that …


Sports Analytics With Computer Vision, Colby T. Jeffries Jan 2018

Sports Analytics With Computer Vision, Colby T. Jeffries

Senior Independent Study Theses

Computer vision in sports analytics is a relatively new development. With multi-million dollar systems like STATS’s SportVu, professional basketball teams are able to collect extremely fine-detailed data better than ever before. This concept can be scaled down to provide similar statistics collection to college and high school basketball teams. Here we investigate the creation of such a system using open-source technologies and less expensive hardware. In addition, using a similar technology, we examine basketball free throws to see whether a shooter’s form has a specific relationship to a shot’s outcome. A system that learns this relationship could be used to …


Novelty Detection Of Machinery Using A Non-Parametric Machine Learning Approach, Enrique Angola Jan 2018

Novelty Detection Of Machinery Using A Non-Parametric Machine Learning Approach, Enrique Angola

Graduate College Dissertations and Theses

A novelty detection algorithm inspired by human audio pattern recognition is conceptualized and experimentally tested. This anomaly detection technique can be used to monitor the health of a machine or could also be coupled with a current state of the art system to enhance its fault detection capabilities. Time-domain data obtained from a microphone is processed by applying a short-time FFT, which returns time-frequency patterns. Such patterns are fed to a machine learning algorithm, which is designed to detect novel signals and identify windows in the frequency domain where such novelties occur. The algorithm presented in this paper uses one-dimensional …


Triple Non-Negative Matrix Factorization Technique For Sentiment Analysis And Topic Modeling, Alexander A. Waggoner Jan 2017

Triple Non-Negative Matrix Factorization Technique For Sentiment Analysis And Topic Modeling, Alexander A. Waggoner

CMC Senior Theses

Topic modeling refers to the process of algorithmically sorting documents into categories based on some common relationship between the documents. This common relationship between the documents is considered the “topic” of the documents. Sentiment analysis refers to the process of algorithmically sorting a document into a positive or negative category depending whether this document expresses a positive or negative opinion on its respective topic. In this paper, I consider the open problem of document classification into a topic category, as well as a sentiment category. This has a direct application to the retail industry where companies may want to scour …


The New Issues In Classification Problems, Md Mahmudul Hasan Jan 2016

The New Issues In Classification Problems, Md Mahmudul Hasan

Open Access Theses & Dissertations

The data involved with science and engineering getting bigger everyday. To study and organize a big amount of data is difficult without classification. In machine learning, classification is the problem of identifying a given data from a set of categories. There are several classification technique people using to classify a given data. In our work we present a sparse representation technique to perform classification. The popularity of this technique motivates us to use on our collected samples. To find a sparse representation, we used an $l_1$-minimization algorithm which is a convex relaxation algorithm proven very efficient by researchers. The purpose …


Predicting Intraday Financial Market Dynamics Using Takens' Vectors; Incorporating Causality Testing And Machine Learning Techniques, Abubakar-Sadiq Bouda Abdulai Dec 2015

Predicting Intraday Financial Market Dynamics Using Takens' Vectors; Incorporating Causality Testing And Machine Learning Techniques, Abubakar-Sadiq Bouda Abdulai

Electronic Theses and Dissertations

Traditional approaches to predicting financial market dynamics tend to be linear and stationary, whereas financial time series data is increasingly nonlinear and non-stationary. Lately, advances in dynamical systems theory have enabled the extraction of complex dynamics from time series data. These developments include theory of time delay embedding and phase space reconstruction of dynamical systems from a scalar time series. In this thesis, a time delay embedding approach for predicting intraday stock or stock index movement is developed. The approach combines methods of nonlinear time series analysis with those of causality testing, theory of dynamical systems and machine learning (artificial …


Application Of Machine Learning To Mapping And Simulating Gene Regulatory Networks, Hien-Haw Liow May 2015

Application Of Machine Learning To Mapping And Simulating Gene Regulatory Networks, Hien-Haw Liow

Arts & Sciences Electronic Theses and Dissertations

This dissertation explores, proposes, and examines methods of applying modernmachine learning and Bayesian statistics in the quantitative and qualitative modeling of gene regulatory networks using high-throughput gene expression data. A semi-parametric Bayesian model based on random forest is developed to infer quantitative aspects of gene regulation relations; a parametric model is developed to predict geneexpression levels solely from genotype information. Simulation of network behavior is shown to complement regression analysis greatly in capturing the dynamics of gene regulatory networks. Finally, as an application and extension of novel approaches in gene expression analysis, new methods of discovering topological structure of gene …