Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li Dec 2020

Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li

Masters Theses

Machine learning hyperparameter optimization has always been the key to improve model performance. There are many methods of hyperparameter optimization. The popular methods include grid search, random search, manual search, Bayesian optimization, population-based optimization, etc. Random search occupies less computations than the grid search, but at the same time there is a penalty for accuracy. However, this paper proposes a more effective random search method based on the traditional random search and hyperparameter space separation. This method is named random search plus. This thesis empirically proves that random search plus is more effective than random search. There are some case …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre Aug 2020

Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre

SMU Data Science Review

In this paper, we explore a representation methodology for the compression of DNA isolates. Using lossless string compression via tokenization of frequently repeated segments of DNA, we reduce the length of the isolates to be counted as k-mers for classification. With this new representation, we apply a previously established feature sampling method to dramatically reduce the feature space. In understanding the genetic diversity, we also look at conserving biological function across these spaces. Using a random forest model we were able to predict the resistance or susceptibility of bacteria with 85-90\% accuracy, with a 30-50\% reduction in overall isolate length, …


Evaluation Of Standard And Semantically-Augmented Distance Metrics For Neurology Patients, Daniel B. Hier, Jonathan Kopel, Steven U. Brint, Donald C. Wunsch, Gayla R. Olbricht, Sima Azizi, Blaine Allen Aug 2020

Evaluation Of Standard And Semantically-Augmented Distance Metrics For Neurology Patients, Daniel B. Hier, Jonathan Kopel, Steven U. Brint, Donald C. Wunsch, Gayla R. Olbricht, Sima Azizi, Blaine Allen

Electrical and Computer Engineering Faculty Research & Creative Works

Background: Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks.

Methods: We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by …


Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis Aug 2020

Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis

Theses and Dissertations

Sepsis is a potentially life-threatening condition characterized by a dysregulated, disproportionate immune response to infection by which the afflicted body attacks its own tissues, sometimes to the point of organ failure, and in the worst cases, death. According to the Centers for Disease Control and Prevention (CDC) Sepsis is reported to kill upwards of 270,000 Americans annually, though this figure may be greater given certain ambiguities in the current accepted diagnostic framework of the disease.

This study attempted to first establish an understanding of past definitions of sepsis, and to then recommend use of machine learning as integral in an …


An Improved Method For Spectroscopic Quality Classification, Elizabeth G. Mayer Jul 2020

An Improved Method For Spectroscopic Quality Classification, Elizabeth G. Mayer

Mathematics & Statistics ETDs

Spectral quality classification is a vital step in data cleaning before the

analysis of magnetic resonance spectroscopy (MRS) data can be done. This

analysis compares five methods of quality classification; three of these are

legacy methods, Maudsley et al. (2006), Zhang et al. (2018), and

Bustillo et al. (2020), and two newly created methods that used a random forests

classifier (RFC) to inform their classifications. We found that the random forest

classifier was the most accurate at predicting spectra quality (balanced

accuracy for RF of 88% vs legacy of 70%, 72%, or 72%). A

Random-Forests-Informed Filtering method (RFIFM) for quality …


A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley Jul 2020

A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley

Theses and Dissertations

According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …


Combining Machine Learning And Empirical Engineering Methods Towards Improving Oil Production Forecasting, Andrew J. Allen Jul 2020

Combining Machine Learning And Empirical Engineering Methods Towards Improving Oil Production Forecasting, Andrew J. Allen

Master's Theses

Current methods of production forecasting such as decline curve analysis (DCA) or numerical simulation require years of historical production data, and their accuracy is limited by the choice of model parameters. Unconventional resources have proven challenging to apply traditional methods of production forecasting because they lack long production histories and have extremely variable model parameters. This research proposes a data-driven alternative to reservoir simulation and production forecasting techniques. We create a proxy-well model for predicting cumulative oil production by selecting statistically significant well completion parameters and reservoir information as independent predictor variables in regression-based models. Then, principal component analysis (PCA) …


Subsurface Analytics: Contribution Of Artificial Intelligence And Machine Learning To Reservoir Engineering, Reservoir Modeling, And Reservoir Management, Shahab D. Mohaghegh Apr 2020

Subsurface Analytics: Contribution Of Artificial Intelligence And Machine Learning To Reservoir Engineering, Reservoir Modeling, And Reservoir Management, Shahab D. Mohaghegh

Faculty & Staff Scholarship

Subsurface Analytics is a new technology that changes the way reservoir simulation and modeling is performed. Instead of starting with the construction of mathematical equations to model the physics of the fluid flow through porous media and then modification of the geological models in order to achieve history match, Subsurface Analytics that is a completely AI-based reservoir simulation and modeling technology takes a completely different approach. In AI-based reservoir modeling, field measurements form the foundation of the reservoir model. Using data-driven, pattern recognition technologies; the physics of the fluid flow through porous media is modeled through discovering the best, most …


Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich Jan 2020

Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich

Theses and Dissertations--Mathematics

Despite the recent success of various machine learning techniques, there are still numerous obstacles that must be overcome. One obstacle is known as the vanishing/exploding gradient problem. This problem refers to gradients that either become zero or unbounded. This is a well known problem that commonly occurs in Recurrent Neural Networks (RNNs). In this work we describe how this problem can be mitigated, establish three different architectures that are designed to avoid this issue, and derive update schemes for each architecture. Another portion of this work focuses on the often used technique of batch normalization. Although found to be successful …


How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller Jan 2020

How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller

CMC Senior Theses

In this paper I will be breaking down a scholarly article, written by Sameer K. Deshpande and Shane T. Jensen, that proposed a new method to evaluate NBA players. The NBA is the highest level professional basketball league in America and stands for the National Basketball Association. They proposed to build a model that would result in how NBA players impact their teams chances of winning a game, using machine learning and probability concepts. I preface that by diving into these concepts and their mathematical backgrounds. These concepts include building a linear model using ordinary least squares method, the bias …


Process Based Analysis Of Fluvial Stratigraphic Record: Middle Pennsylvanian Allegheny Formation, North-Central Wv, Oluwasegun O. Abatan Jan 2020

Process Based Analysis Of Fluvial Stratigraphic Record: Middle Pennsylvanian Allegheny Formation, North-Central Wv, Oluwasegun O. Abatan

Graduate Theses, Dissertations, and Problem Reports

Fluvial deposits represent some of the best hydrocarbon reservoirs, but the quality of fluvial reservoirs varies depending on the reservoir architecture, which is controlled by allogenic and autogenic processes. Allogenic controls, including paleoclimate, tectonics, and glacio-eustasy, have long been debated as dominant controls in the deposition of fluvial strata. However, recent research has questioned the validity of this cyclicity and may indicate major influence from autogenic controls. To further investigate allogenic controls on stratal order, I analyzed the facies architecture, geomorphology, paleohydrology, and the stratigraphic framework of the Middle Pennsylvanian Allegheny Formation (MPAF), a fluvial depositional system in the Appalachian …


Three Essays On Health Economics And Policy Evaluation, Shishir Shakya Jan 2020

Three Essays On Health Economics And Policy Evaluation, Shishir Shakya

Graduate Theses, Dissertations, and Problem Reports

This dissertation consists of three essays on the U.S. Health care policy. Each paragraph below refers to the three abstracts for the three chapters in this dissertation, respectively. I provide quantitative evidence on how much Prescription Drug Monitoring Programs (PDMPs) affects the retail opioid prescribing behaviors. Using the American Community Survey (ACS), I retrieve county-level high dimensional panel data set from 2010 to 2017. I employ three separate identification strategies: difference-in-difference, double selection post-LASSO, and spatial difference-in-difference. I compare how the retail opioid prescribing behaviors of counties, that are mandatory for prescribers to check the PDMP before prescribing controlled substances …


Artificial Neural Network Models For Pattern Discovery From Ecg Time Series, Mehakpreet Kaur Jan 2020

Artificial Neural Network Models For Pattern Discovery From Ecg Time Series, Mehakpreet Kaur

Electronic Theses and Dissertations

Artificial Neural Network (ANN) models have recently become de facto models for deep learning with a wide range of applications spanning from scientific fields such as computer vision, physics, biology, medicine to social life (suggesting preferred movies, shopping lists, etc.). Due to advancements in computer technology and the increased practice of Artificial Intelligence (AI) in medicine and biological research, ANNs have been extensively applied not only to provide quick information about diseases, but also to make diagnostics accurate and cost-effective. We propose an ANN-based model to analyze a patient's electrocardiogram (ECG) data and produce accurate diagnostics regarding possible heart diseases …


The Application Of Machine Learning Models In The Concussion Diagnosis Process, Sujit Subhash Jan 2020

The Application Of Machine Learning Models In The Concussion Diagnosis Process, Sujit Subhash

Masters Theses

“Concussions represent a growing health concern and are challenging to diagnose and manage. Roughly four million concussions are diagnosed every year in the United States. Although research into the application of advanced metrics such as neuroimages and blood biomarkers has shown promise, they are yet to be implemented at a clinical level due to cost and reliability concerns. Therefore, concussion diagnosis is still reliant on clinical evaluations of symptoms, balance, and neurocognitive status and function. The lack of a universal threshold on these assessments makes the diagnosis process entirely reliant on a physician’s interpretation of these assessment scores. This study …