Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Physical Sciences and Mathematics

Flexible Attenuation Fields: Tomographic Reconstruction From Heterogeneous Datasets, Clifford S. Parker Jan 2024

Flexible Attenuation Fields: Tomographic Reconstruction From Heterogeneous Datasets, Clifford S. Parker

Theses and Dissertations--Computer Science

Traditional reconstruction methods for X-ray computed tomography (CT) are highly constrained in the variety of input datasets they admit. Many of the imaging settings -- the incident energy, field-of-view, effective resolution -- remain fixed across projection images, and the only real variance is in the detector's position and orientation with respect to the scene. In contrast, methods for 3D reconstruction of natural scenes are extremely flexible to the geometric and photometric properties of the input datasets, readily accepting and benefiting from images captured under varying lighting conditions, with different cameras, and at disparate points in time and space. Extending CT …


Advanced Mathematical Graph-Based Machine Learning And Deep Learning Models For Drug Design, Farjana Tasnim Mukta Jan 2024

Advanced Mathematical Graph-Based Machine Learning And Deep Learning Models For Drug Design, Farjana Tasnim Mukta

Theses and Dissertations--Mathematics

Drug discovery is a highly complicated and time-consuming process. One of the main challenges in drug development is predicting whether a drug-like molecule will interact with a specific target protein. This prediction accelerates target validation and drug development. Recent research in biomolecular sciences has shown significant interest in algebraic graph-based models for representing molecular complexes and predicting drug-target binding affinity. In this thesis, we present algebraic graph-based molecular representations to create data-driven scoring functions (SF) using extended atom types to capture wide-range interactions between targets and drug candidates. Our model employs multiscale weighted colored subgraphs for the protein-ligand complex, colored …


Machine Learning Framework For Real-World Electronic Health Records Regarding Missingness, Interpretability, And Fairness, Jing Lucas Liu Jan 2023

Machine Learning Framework For Real-World Electronic Health Records Regarding Missingness, Interpretability, And Fairness, Jing Lucas Liu

Theses and Dissertations--Computer Science

Machine learning (ML) and deep learning (DL) techniques have shown promising results in healthcare applications using Electronic Health Records (EHRs) data. However, their adoption in real-world healthcare settings is hindered by three major challenges. Firstly, real-world EHR data typically contains numerous missing values. Secondly, traditional ML/DL models are typically considered black-boxes, whereas interpretability is required for real-world healthcare applications. Finally, differences in data distributions may lead to unfairness and performance disparities, particularly in subpopulations.

This dissertation proposes methods to address missing data, interpretability, and fairness issues. The first work proposes an ensemble prediction framework for EHR data with large missing …


Symbolic Computation Of Squared Amplitudes In High Energy Physics With Machine Learning, Abdulhakim Alnuqaydan Jan 2023

Symbolic Computation Of Squared Amplitudes In High Energy Physics With Machine Learning, Abdulhakim Alnuqaydan

Theses and Dissertations--Physics and Astronomy

The calculation of particle interaction squared amplitudes is a key step in the calculation of cross sections in high-energy physics. These complex calculations are currently performed using domain-specific symbolic algebra tools, where the computational time escalates rapidly with an increase in the number of loops and final state particles. This dissertation introduces an innovative approach: employing a transformer-based sequence-to-sequence model capable of accurately predicting squared amplitudes of Standard Model processes up to one-loop order when trained on symbolic sequence pairs. The primary objective of this work is to significantly reduce the computational time and, more importantly, develop a model that …


Developing And Deploying Data-Driven Tools For Accelerated Design Of Organic Semiconductors, Vinayak Bhat Jan 2023

Developing And Deploying Data-Driven Tools For Accelerated Design Of Organic Semiconductors, Vinayak Bhat

Theses and Dissertations--Chemistry

Organic semiconductors have gained widespread attention due to their potential applications in flexible, low-cost, lightweight electronics, energy storage and generation technologies, and sensing applications. However, developing new organic semiconductors with improved performance remains a significant challenge due to the vast chemical space of possible molecular and materials structures. Furthermore, the high cost and time-consuming nature of experimental synthesis and characterization hinder the rapid discovery of new materials. To overcome these challenges, this dissertation presents a data-driven approach to organic semiconductor discovery. The primary focus of this work is the development of data-driven tools, namely machine learning models, to predict critical …


Normalization Techniques For Sequential And Graphical Data, Cole Pospisil Jan 2023

Normalization Techniques For Sequential And Graphical Data, Cole Pospisil

Theses and Dissertations--Mathematics

Normalization methods have proven to be an invaluable tool in the training of deep neural networks. In particular, Layer and Batch Normalization are commonly used to mitigate the risks of exploding and vanishing gradients. This work presents two methods which are related to these normalization techniques. The first method is Batch Normalized Preconditioning (BNP) for recurrent neural networks (RNN) and graph convolutional networks (GCN). BNP has been suggested as a technique for Fully Connected and Convolutional networks for achieving similar performance benefits to Batch Normalization by controlling the condition number of the Hessian through preconditioning on the gradients. We extend …


Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange Jan 2022

Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange

Theses and Dissertations--Mathematics

Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this work, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss …


Development Of Accurate And Efficient Computational Methodologies For Predicting Protein-Ligand And Protein-Protein Binding Free Energies, Alexander Hamilton Williams Jan 2022

Development Of Accurate And Efficient Computational Methodologies For Predicting Protein-Ligand And Protein-Protein Binding Free Energies, Alexander Hamilton Williams

Theses and Dissertations--Pharmacy

Computational modeling is an invaluable tool in the drug discovery process either for small ligand or protein therapeutics. The widespread availability of protein X-Ray Crystal and Cryo-Electron Microscopy (Cryo-EM) structures has allowed for more accurate molecular dynamics (MD) simulations that are not reliant on methods such as homology modeling, which may produce structures that require significant computational time to demonstrate their stability. In this thesis we describe several novel methodologies for the computationally efficient modeling of protein/ligand and protein/protein complexes that may be employed within both large-scale virtual screenings and lead compound optimization. These methodologies may also be utilized in …


Weakly Supervised Learning For Multi-Image Synthesis, Muhammad Usman Rafique Jan 2021

Weakly Supervised Learning For Multi-Image Synthesis, Muhammad Usman Rafique

Theses and Dissertations--Electrical and Computer Engineering

Machine learning-based approaches have been achieving state-of-the-art results on many computer vision tasks. While deep learning and convolutional networks have been incredibly popular, these approaches come at the expense of huge amounts of labeled data required for training. Manually annotating large amounts of data, often millions of images in a single dataset, is costly and time consuming. To deal with the problem of data annotation, the research community has been exploring approaches that require less amount of labelled data.

The central problem that we consider in this research is image synthesis without any manual labeling. Image synthesis is a classic …


Deep Neural Architectures For End-To-End Relation Extraction, Tung Tran Jan 2020

Deep Neural Architectures For End-To-End Relation Extraction, Tung Tran

Theses and Dissertations--Computer Science

The rapid pace of scientific and technological advancements has led to a meteoric growth in knowledge, as evidenced by a sharp increase in the number of scholarly publications in recent years. PubMed, for example, archives more than 30 million biomedical articles across various domains and covers a wide range of topics including medicine, pharmacy, biology, and healthcare. Social media and digital journalism have similarly experienced their own accelerated growth in the age of big data. Hence, there is a compelling need for ways to organize and distill the vast, fragmented body of information (often unstructured in the form of natural …


Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich Jan 2020

Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich

Theses and Dissertations--Mathematics

Despite the recent success of various machine learning techniques, there are still numerous obstacles that must be overcome. One obstacle is known as the vanishing/exploding gradient problem. This problem refers to gradients that either become zero or unbounded. This is a well known problem that commonly occurs in Recurrent Neural Networks (RNNs). In this work we describe how this problem can be mitigated, establish three different architectures that are designed to avoid this issue, and derive update schemes for each architecture. Another portion of this work focuses on the often used technique of batch normalization. Although found to be successful …


Rule Mining And Sequential Pattern Based Predictive Modeling With Emr Data, Orhan Abar Jan 2019

Rule Mining And Sequential Pattern Based Predictive Modeling With Emr Data, Orhan Abar

Theses and Dissertations--Computer Science

Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patients’ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of …


Lattice Simplices: Sufficiently Complicated, Brian Davis Jan 2019

Lattice Simplices: Sufficiently Complicated, Brian Davis

Theses and Dissertations--Mathematics

Simplices are the "simplest" examples of polytopes, and yet they exhibit much of the rich and subtle combinatorics and commutative algebra of their more general cousins. In this way they are sufficiently complicated --- insights gained from their study can inform broader research in Ehrhart theory and associated fields.

In this dissertation we consider two previously unstudied properties of lattice simplices; one algebraic and one combinatorial. The first is the Poincar\'e series of the associated semigroup algebra, which is substantially more complicated than the Hilbert series of that same algebra. The second is the partial ordering of the elements of …


Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal Jan 2019

Relation Prediction Over Biomedical Knowledge Bases For Drug Repositioning, Mehmet Bakal

Theses and Dissertations--Computer Science

Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task …


Deep Neural Networks For Multi-Label Text Classification: Application To Coding Electronic Medical Records, Anthony Rios Jan 2018

Deep Neural Networks For Multi-Label Text Classification: Application To Coding Electronic Medical Records, Anthony Rios

Theses and Dissertations--Computer Science

Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders.

The main difficulty with developing automated EMR coding methods is the nature of the label space. The …


Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones Jan 2018

Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones

Theses and Dissertations--Computer Science

In order to reduce the time associated with and the costs of drug discovery, machine learning is being used to automate much of the work in this process. However the size and complex nature of molecular data makes the application of machine learning especially challenging. Much work must go into the process of engineering features that are then used to train machine learning models, costing considerable amounts of time and requiring the knowledge of domain experts to be most effective. The purpose of this work is to demonstrate data driven approaches to perform the feature selection and extraction steps in …


Context-Aware Debugging For Concurrent Programs, Justin Chu Jan 2017

Context-Aware Debugging For Concurrent Programs, Justin Chu

Theses and Dissertations--Computer Science

Concurrency faults are difficult to reproduce and localize because they usually occur under specific inputs and thread interleavings. Most existing fault localization techniques focus on sequential programs but fail to identify faulty memory access patterns across threads, which are usually the root causes of concurrency faults. Moreover, existing techniques for sequential programs cannot be adapted to identify faulty paths in concurrent programs. While concurrency fault localization techniques have been proposed to analyze passing and failing executions obtained from running a set of test cases to identify faulty access patterns, they primarily focus on using statistical analysis. We present a novel …


Soil Hydraulic Property Estimation Under Major Land-Uses In The Shawnee Hills, Trinity Joseph Baker Jan 2017

Soil Hydraulic Property Estimation Under Major Land-Uses In The Shawnee Hills, Trinity Joseph Baker

Theses and Dissertations--Plant and Soil Sciences

The ability to map soil moisture is becoming more important with changing climates and modeling these effects depends on reliable estimations of hydrologic soil properties under different land managements. This study: 1) tests the application of existing soil hydraulic property estimation methods against in-situ values of six catenas under two covers (forest and grass); 2) validate Random Forest Algorithm (RF) estimates informed from the six catenas on two separate catenas; 3) identify Rapid Carbon Assessment (RaCA) sites within the Shawnee Hills Region that represent different land-uses (Crop, Conservation Reserve Program (CRP), Forest, and Pasture); 4) apply RF learning tree informed …