Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 50

Full-Text Articles in Physical Sciences and Mathematics

Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara Dec 2023

Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara

Electronic Theses and Dissertations

Causal inference is a method used in various fields to draw causal conclusions based on data. It involves using assumptions, study designs, and estimation strategies to minimize the impact of confounding variables. Propensity scores are used to estimate outcome effects, through matching methods, stratification, weighting methods, and the Covariate Balancing Propensity Score method. However, they can be sensitive to estimation techniques and can lead to unstable findings. Researchers have proposed integrating weighing with regression adjustment in parametric models to improve causal inference validity. The first project focuses on Bayesian joint and two-stage methods for propensity score analysis. Propensity score modeling …


Causal Inference For The Effect Of Continuous Treatment On Time-To-Event Outcomes And Mediation Analysis On Health Disparities In Observational Studies., Triparna Poddar Dec 2023

Causal Inference For The Effect Of Continuous Treatment On Time-To-Event Outcomes And Mediation Analysis On Health Disparities In Observational Studies., Triparna Poddar

Electronic Theses and Dissertations

The dissertation comprises two projects related to causal inference based on observational data. In healthcare research, where abundant observational data such as claims data and electronic records are available, researchers often aim to study the treatment effect and the pathway of that effect. However, estimating treatment effects in observational data presents challenges due to confounding factors. The first project focuses on estimating continuous treatment effects for survival outcomes, while the second concentrates on mediation analysis, allowing the exploration of the pathway of the causal effect. Both projects involve addressing confounding variables. In the first project, I investigate estimation of the …


Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman Aug 2023

Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman

Electronic Theses and Dissertations

This dissertation consists of three research projects on cancer screening probability modeling. In these projects, the three key modeling parameters (sensitivity, sojourn time, transition density) for cancer screening were estimated, along with the long-term outcomes (including overdiagnosis as one outcome), the optimal screening time/age, the lead time distribution, and the probability of overdiagnosis at the future screening time were simulated to provide a statistical perspective on the effectiveness of cancer screening programs. In the first part of this dissertation, a statistical inference was conducted for male and female smokers using the National Lung Screening Trial (NLST) chest X-ray data. A …


Cannabidiol Tweet Miner: A Framework For Identifying Misinformation In Cbd Tweets., Jason Turner Aug 2023

Cannabidiol Tweet Miner: A Framework For Identifying Misinformation In Cbd Tweets., Jason Turner

Electronic Theses and Dissertations

As regulations surrounding cannabis continue to develop, the demand for cannabis-based products is on the rise. Despite not producing the psychoactive effects commonly associated with THC, products containing cannabidiol (CBD) have gained immense popularity in recent years as a potential treatment option for a range of conditions, particularly those associated with pain or sleep disorders. However, due to current federal policies, these products have yet to undergo comprehensive safety and efficacy testing. Fortunately, utilizing advanced natural language processing (NLP) techniques, data harvested from social networks have been employed to investigate various social trends within healthcare, such as disease tracking and …


A Data-Driven Multi-Regime Approach For Predicting Real-Time Energy Consumption Of Industrial Machines., Abdulgani Kahraman Aug 2023

A Data-Driven Multi-Regime Approach For Predicting Real-Time Energy Consumption Of Industrial Machines., Abdulgani Kahraman

Electronic Theses and Dissertations

This thesis focuses on methods for improving energy consumption prediction performance in complex industrial machines. Working with real-world industrial machines brings several challenges, including data access, algorithmic bias, data privacy, and the interpretation of machine learning algorithms. To effectively manage energy consumption in the industrial sector, it is essential to develop a framework that enhances prediction performance, reduces energy costs, and mitigates air pollution in heavy industrial machine operations. This study aims to assist managers in making informed decisions and driving the transition towards green manufacturing. The energy consumption of industrial machinery is substantial, and the recent increase in CO2 …


Penalized Bayesian Exponential Random Graph Models., Vicki Modisette Aug 2023

Penalized Bayesian Exponential Random Graph Models., Vicki Modisette

Electronic Theses and Dissertations

Networks have the critical ability to represent the complex interconnectedness of social relationships, biological processes, and the spread of diseases and information. Exponential random graph models (ERGM) are one of the popular statistical methods for analyzing network data. ERGM, however, struggle with computational challenges and degeneracy issues, further exacerbated by their inability to handle high-dimensional network data. Bayesian techniques provide a promising avenue to overcome these two problems. This paper considers penalized Bayesian exponential random graph models with adaptive lasso and adaptive ridge penalties to perform variable selection and reduce multicollinearity on a variety of networks. The experimental results demonstrate …


Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche Aug 2022

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Statistical Methods For Personalized Treatment Selection And Survival Data Analysis Based On Observational Data With High-Dimensional Covariates., Don Ramesh Dinendra Sudaraka Tholkage Aug 2022

Statistical Methods For Personalized Treatment Selection And Survival Data Analysis Based On Observational Data With High-Dimensional Covariates., Don Ramesh Dinendra Sudaraka Tholkage

Electronic Theses and Dissertations

Due to the wide availability of functional data from multiple disciplines, the studies of functional data analysis have become popular in the recent literature. However, the related development in censored survival data has been relatively sparse. In Chapter 2, we consider the problem of analyzing time-to-event data in the presence of functional predictors. We develop a conditional generalized Kaplan Meier (KM) estimator that incorporates functional predictors using kernel weights and rigorously establishes its asymptotic properties. In addition, we propose to select the optimal bandwidth based on a time-dependent Brier score. We then carry out extensive numerical studies to examine the …


Statistical Methods For Assessing Drug Interactions And Identifying Effect Modifiers Using Observational Data., Qian Xu May 2022

Statistical Methods For Assessing Drug Interactions And Identifying Effect Modifiers Using Observational Data., Qian Xu

Electronic Theses and Dissertations

This dissertation consists of three projects related to causal inference based on observational data. In the first project, we propose a double robust to identify the effect modifiers and estimate optimal treatment. Observational studies differ from experimental studies in that assignment of subjects to treatments is not randomized but rather occurs due to natural mechanisms, which are usually hidden from the researchers. Many statistical methods to identify the treatment effect and select the optimal personalized treatment for experimental studies may not be suitable for observational studies any more. In this project, we propose a exible outcome model to select the …


Estimating Treatment Effect On Medical Cost And Examining Medical Cost Trajectory Using Splines And Change Point Techniques., Indranil Ghosh Dec 2021

Estimating Treatment Effect On Medical Cost And Examining Medical Cost Trajectory Using Splines And Change Point Techniques., Indranil Ghosh

Electronic Theses and Dissertations

In the world of growing medical needs, other than the clinical outcomes, the cost of healthcare is one of the important aspects to evaluate. The cost of treatment could act as a decisive factor on which one to choose from two equally likely effective treatment options. In literature, the most used quantity for the cost of treatment is cumulative lifetime cost since the diagnosis of a disease. While it provides a bird' eye view of the treatment cost, it fails to capture the underlying pattern of the treatment cost trajectory. We developed a marginal structural functional model (MSFM) using an …


Predictive Modeling Of Clinical Outcomes For Hospitalized Covid-19 Patients Utilizing Cytof And Clinical Data., Onajia Stubblefield Aug 2021

Predictive Modeling Of Clinical Outcomes For Hospitalized Covid-19 Patients Utilizing Cytof And Clinical Data., Onajia Stubblefield

Electronic Theses and Dissertations

In December 2019, an outbreak of a novel coronavirus initiated a global pandemic. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a virus that causes the disease coronavirus disease 2019 (COVID-19). Symptoms of infection with COVID-19 vary widely between individuals. While some infected individuals are asymptomatic, others need more extensive care and require hospitalization. Indeed, the COVID-19 pandemic was characterized by a shortage of hospital beds which presented additional complications in providing adequate care for patients. In this study, we used a combination of T cell population data collected from mass cytometry analysis and clinical markers to form a predictive …


Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin Aug 2021

Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin

Electronic Theses and Dissertations

In this work, we seek to develop a variable screening and selection method for Bayesian mixture models with longitudinal data. To develop this method, we consider data from the Health and Retirement Survey (HRS) conducted by University of Michigan. Considering yearly out-of-pocket expenditures as the longitudinal response variable, we consider a Bayesian mixture model with $K$ components. The data consist of a large collection of demographic, financial, and health-related baseline characteristics, and we wish to find a subset of these that impact cluster membership. An initial mixture model without any cluster-level predictors is fit to the data through an MCMC …


Estimating Cumulative Incidence Rate On Interval Censored Data In An Illness-Death Model., Chen Qian May 2021

Estimating Cumulative Incidence Rate On Interval Censored Data In An Illness-Death Model., Chen Qian

Electronic Theses and Dissertations

Phase IV clinical trials are designed to monitor long-term side effects caused overtime by the medical treatment. For instance, in advanced primary cancer treatment, childhood cancer survivors are often at risk of developing undesired events, such as cardiotoxicity, during their adulthood. Such problems could be due to their cancer or the treatment they received for their cancer such as radiation or intensive chemotherapy. Cardiotoxicity can be diagnosed with electrophysiology with measurements of fraction shortening, afterload, etc. Often the primary focus of a study could be on estimating the cumulative incidence of a particular outcome of interest such as cardiotoxicity. However, …


Observational Studies In Group Testing And Potential Applications., Alexander Christopher Noll May 2021

Observational Studies In Group Testing And Potential Applications., Alexander Christopher Noll

Electronic Theses and Dissertations

The use of group testing to identify individuals with targeted outcomes in a population can greatly improve the efficiency, speed, and cost effectiveness of testing a population for an outcome, or at least for identifying the prevalence of an outcome in a population. The implementation of causal inference techniques can provide the basis for an observational study that would allow an investigator to gather estimates for treatment effectiveness if group testing was conducted on the population in a certain way. This thesis examines a simulation of the above outlined principles in order to demonstrate a potential application for determining treatment …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun Dec 2020

Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun

Electronic Theses and Dissertations

This dissertation consists of three projects related to Modified-Half-Normal distribution and causal inference. In my first project, a new distribution called Modified-Half-Normal distribution was introduced. I explored a few of its distributional properties, the procedures for generating random samples based on Bayesian approaches, and the parameter estimation based on the method of moments. The second project deals with the problem of selection bias of average treatment effect (ATE) if we use the observational data. I combined the propensity score based inverse probability of treatment weighting (IPTW) method and the directed acyclic graph (DAG) to solve this problem. The third project …


Aspects Of Causal Inference., John A. Craycroft Dec 2020

Aspects Of Causal Inference., John A. Craycroft

Electronic Theses and Dissertations

Observational studies differ from experimental studies in that assignment of subjects to treatments is not randomized but rather occurs due to natural mechanisms, which are usually hidden from researchers. Yet objectives of the two studies are frequently the same: identify the causal – rather than merely associational – relationship between some treatment or exposure and an outcome. The statistical issues that arise in properly analyzing observational data for this goal are numerous and fascinating, and these issues are encompassed in the domain of causal inference. The research presented in this dissertation explores several distinct aspects of causal inference. This dissertation …


Marginal Methods And Software For Clustered Data With Cluster- And Group-Size Informativeness., Mary Elizabeth Gregg Aug 2020

Marginal Methods And Software For Clustered Data With Cluster- And Group-Size Informativeness., Mary Elizabeth Gregg

Electronic Theses and Dissertations

Clustered data result when observations have some natural organizational association. In such data, cluster size is defined as the number of observations belonging to a cluster. A phenomenon termed informative cluster size (ICS) occurs when observation outcomes vary in a systematic way related to the cluster size. An additional form of informativeness, termed informative within-cluster group size (IWCGS), arises when the distribution of group-defining categorical covariates within clusters similarly carries information related to outcomes. Standard methods for the marginal analysis of clustered data can produce biased estimates and inference when data have informativeness. A reweighting methodology has been developed that …


Linear Methods For Regression With Small Sample Sizes Relative To The Number Of Variables., Rajesh Sikder Aug 2020

Linear Methods For Regression With Small Sample Sizes Relative To The Number Of Variables., Rajesh Sikder

Electronic Theses and Dissertations

In data sets where there are a small number of observations but a large number of variables observed for each observation, ordinary least squares estimation cannot be used for regression models. There are many alternative including stepwise regression, penalized methods such as ridge regression and the LASSO, and methods based on derived inputs such as principal components regression and partial least squares regression. In this thesis, these five methods are described. K-fold cross validation is also discussed as a way for determining regularization parameters for each method. The performance of these methods in estimation and prediction is also examined through …


Novel Bayesian Methodology For The Analysis Of Single-Cell Rna Sequencing Data., Michael Sekula May 2020

Novel Bayesian Methodology For The Analysis Of Single-Cell Rna Sequencing Data., Michael Sekula

Electronic Theses and Dissertations

With single-cell RNA sequencing (scRNA-seq) technology, researchers are able to gain a better understanding of health and disease through the analysis of gene expression data at the cellular-level; however, scRNA-seq data tend to have high proportions of zero values, increased cell-to-cell variability, and overdispersion due to abnormally large expression counts, which create new statistical problems that need to be addressed. This dissertation includes three research projects that propose Bayesian methodology suitable for scRNA-seq analysis. In the first project, a hurdle model for identifying differentially expressed genes across cell types in scRNA-seq data is presented. This model incorporates a correlated random …


Statistical Methods For Estimating And Testing Treatment Effect For Multiple Treatment Groups In Observational Studies., Xiaofang Yan Dec 2019

Statistical Methods For Estimating And Testing Treatment Effect For Multiple Treatment Groups In Observational Studies., Xiaofang Yan

Electronic Theses and Dissertations

Note: Abstract would not save due to an issue with some of the characters.


Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava Aug 2019

Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava

Electronic Theses and Dissertations

The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney …


Novel Bayesian Methodology In Multivariate Problems., Debamita Kundu Aug 2019

Novel Bayesian Methodology In Multivariate Problems., Debamita Kundu

Electronic Theses and Dissertations

This dissertation involves developing novel Bayesian methodology for multivariate problems. In particular, it focuses on two contexts: shrinkage based variable selection in multivariate regression and simultaneous covariance estimation of multiple groups. Both these projects are centered around fully Bayesian inference schemes based on hierarchical modeling to capture context-specific features of the data and the development of computationally efficient estimation algorithm. Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) …


Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson Dec 2018

Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson

Electronic Theses and Dissertations

The overarching goals of the current work are to fill key gaps in the current understanding of alcohol consumption and the risk of metastasis to the liver. Considering the evidence this research group has compiled confirming that the hepatic matrisome responds dynamically to injury, an altered extracellular matrix (ECM) profile appears to be a key feature of pre-fibrotic inflammatory injury in the liver. This group has demonstrated that the hepatic ECM responds dynamically to alcohol exposure, in particular, sensitizing the liver to LPS-induced inflammatory damage. Although the study of alcohol in its role as a contributing factor to oncogenesis and …


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal Aug 2018

Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal

Electronic Theses and Dissertations

This dissertation consists of three projects and can be categorized in two broad research areas: generalized spatiotemporal modeling and causal inference based on observational data. In the first project, I introduce a Bayesian hierarchical mixed effect hurdle model with a nested random effect structure to model the count for primary care providers and understand their spatial and temporal variation. This study further enables us to identify the health professional shortage areas and the possible impacting factors. In the second project, I have unified popular parametric and nonparametric propensity score-based methods to assess the treatment effect of multiple groups for ordinal …


Multi Self-Adapting Particle Swarm Optimization Algorithm (Msapso)., Gerhard Koch May 2018

Multi Self-Adapting Particle Swarm Optimization Algorithm (Msapso)., Gerhard Koch

Electronic Theses and Dissertations

The performance and stability of the Particle Swarm Optimization algorithm depends on parameters that are typically tuned manually or adapted based on knowledge from empirical parameter studies. Such parameter selection is ineffectual when faced with a broad range of problem types, which often hinders the adoption of PSO to real world problems. This dissertation develops a dynamic self-optimization approach for the respective parameters (inertia weight, social and cognition). The effects of self-adaption for the optimal balance between superior performance (convergence) and the robustness (divergence) of the algorithm with regard to both simple and complex benchmark functions is investigated. This work …


Longitudinal Tracking Of Physiological State With Electromyographic Signals., Robert Warren Stallard May 2018

Longitudinal Tracking Of Physiological State With Electromyographic Signals., Robert Warren Stallard

Electronic Theses and Dissertations

Electrophysiological measurements have been used in recent history to classify instantaneous physiological configurations, e.g., hand gestures. This work investigates the feasibility of working with changes in physiological configurations over time (i.e., longitudinally) using a variety of algorithms from the machine learning domain. We demonstrate a high degree of classification accuracy for a binary classification problem derived from electromyography measurements before and after a 35-day bedrest. The problem difficulty is increased with a more dynamic experiment testing for changes in astronaut sensorimotor performance by taking electromyography and force plate measurements before, during, and after a jump from a small platform. A …


Sample Size Calculations And Normalization Methods For Rna-Seq Data., Xiaohong Li Dec 2017

Sample Size Calculations And Normalization Methods For Rna-Seq Data., Xiaohong Li

Electronic Theses and Dissertations

High-throughput RNA sequencing (RNA-seq) has become the preferred choice for transcriptomics and gene expression studies. With the rapid growth of RNA-seq applications, sample size calculation methods for RNA-seq experiment design and data normalization methods for DEG analysis are important issues to be explored and discussed. The underlying theme of this dissertation is to develop novel sample size calculation methods in RNA-seq experiment design using test statistics. I have also proposed two novel normalization methods for analysis of RNA-seq data. In chapter one, I present the test statistical methods including Wald’s test, log-transformed Wald’s test and likelihood ratio test statistics for …