Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 9 of 9
Full-Text Articles in Physical Sciences and Mathematics
Mathematical Models Yield Insights Into Cnns: Applications In Natural Image Restoration And Population Genetics, Ryan Cecil
Electronic Theses and Dissertations
Due to a rise in computational power, machine learning (ML) methods have become the state-of-the-art in a variety of fields. Known to be black-box approaches, however, these methods are oftentimes not well understood. In this work, we utilize our understanding of model-based approaches to derive insights into Convolutional Neural Networks (CNNs). In the field of Natural Image Restoration, we focus on the image denoising problem. Recent work have demonstrated the potential of mathematically motivated CNN architectures that learn both `geometric' and nonlinear higher order features and corresponding regularizers. We extend this work by showing that not only can geometric features …
Machine Learning Applications For Drug Repurposing, Hansaim Lim
Machine Learning Applications For Drug Repurposing, Hansaim Lim
Dissertations, Theses, and Capstone Projects
The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …
Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre
Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre
SMU Data Science Review
In this paper, we explore a representation methodology for the compression of DNA isolates. Using lossless string compression via tokenization of frequently repeated segments of DNA, we reduce the length of the isolates to be counted as k-mers for classification. With this new representation, we apply a previously established feature sampling method to dramatically reduce the feature space. In understanding the genetic diversity, we also look at conserving biological function across these spaces. Using a random forest model we were able to predict the resistance or susceptibility of bacteria with 85-90\% accuracy, with a 30-50\% reduction in overall isolate length, …
Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis
Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis
Theses and Dissertations
Sepsis is a potentially life-threatening condition characterized by a dysregulated, disproportionate immune response to infection by which the afflicted body attacks its own tissues, sometimes to the point of organ failure, and in the worst cases, death. According to the Centers for Disease Control and Prevention (CDC) Sepsis is reported to kill upwards of 270,000 Americans annually, though this figure may be greater given certain ambiguities in the current accepted diagnostic framework of the disease.
This study attempted to first establish an understanding of past definitions of sepsis, and to then recommend use of machine learning as integral in an …
A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley
A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley
Theses and Dissertations
According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …
Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru
Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru
Kentucky Injury Prevention and Research Center Faculty Publications
BACKGROUND: Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance.
METHODS: Using 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created …
Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu
Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu
Electronic Thesis and Dissertation Repository
ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs.
This thesis presents a novel …
Stage-Specific Predictive Models For Cancer Survivability, Elham Sagheb Hossein Pour
Stage-Specific Predictive Models For Cancer Survivability, Elham Sagheb Hossein Pour
Theses and Dissertations
Survivability of cancer strongly depends on the stage of cancer. In most previous works, machine learning survivability prediction models for a particular cancer, were trained and evaluated together on all stages of the cancer. In this work, we trained and evaluated survivability prediction models for five major cancers, together on all stages and separately for every stage. We named these models joint and stage-specific models respectively. The obtained results for the cancers which we investigated reveal that, the best model to predict the survivability of the cancer for one specific stage is the model which is specifically built for that …
Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny
Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny
U.C. Berkeley Division of Biostatistics Working Paper Series
Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be …