Open Access. Powered by Scholars. Published by Universities.®

Biostatistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses and Dissertations

Discipline
Institution
Keyword
Publication Year

Articles 1 - 30 of 137

Full-Text Articles in Biostatistics

Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop Jul 2023

Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop

Theses and Dissertations

This dissertation focuses on theory and application of discrete data methods, particularly approaches to over- and underdispersion relative to the Poisson distribution and an application of random forest and logistic regression modeling. The first chapter derives a score test for over- and underdispersion in the heaped generalized Poisson distribution. Equi-, over-, and underdispersed heaped generalized Poisson and heaped negative binomial data are simulated to evaluate the performance of the score test by comparing the power it achieves to that of Wald and likelihood ratio tests. We find that the score test we derive performs comparably to both the Wald and …


A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni Jul 2023

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni

Theses and Dissertations

Scan statistics are useful methods for detecting spatial clustering. While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data. They have many applications in different fields such as epidemiology (e.g. detecting disease outbreaks), sociology (e.g. detecting crime hotspots), and environmental health (e.g. detecting high-pollution areas). Spatial scan statistics identify a ‘most likely cluster’ and then use a likelihood ratio test to determine if this cluster is statistically significant. Spatial scan statistics have been extended to the Bayesian …


Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin Jul 2023

Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin

Theses and Dissertations

The recent emergence of single cell sequencing (SCS) technology has provided us with single-cell DNA or RNA sequencing (scDNA/RNA-seq) information to investigate cellular evolutionary relationships. Despite many analysis methods have been developed to infer intra-tumor genetic heterogeneity, cluster cellular subclones, detect genetic mutations, and investigate spatially variable (SV) genes, exploring SCS data remains statistically challenging due to its noisy nature.

To identify subclones with scDNA-seq data, many existing studies use an independent statistical model to detect copy number profile in the first step, followed by classical clustering methods for subclone identification in downstream analyses. However, spurious results might be generated …


Sparse Partitioned Empirical Bayes Ecm Algorithms For High-Dimensional Linear Mixed Effects And Heteroscedastic Regression, Anja Zgodic Apr 2023

Sparse Partitioned Empirical Bayes Ecm Algorithms For High-Dimensional Linear Mixed Effects And Heteroscedastic Regression, Anja Zgodic

Theses and Dissertations

Variable selection methods in both the frequentist and Bayesian frameworks are powerful techniques that provide prediction and inference in high-dimensional linear regression models. These methods often assume independence between observations and normally distributed errors with the same variance. In practice, these two assumptions are often violated. To mitigate this, we develop efficient and powerful Bayesian approaches for linear mixed modeling and heteroscedastic linear regression. These method offers increased flexibility through the development of empirical Bayes estimators for hyperparameters, with computationally efficient estimation through the Expectation Conditional-Minimization (ECM) algorithm. The novelty of these approaches lies in the partitioning and parameter expansion, …


Model-Based Imputation Of Below Detection Limit Missing Data And Group Selection In Bayesian Group Index Regression, Matthew Carli Jan 2023

Model-Based Imputation Of Below Detection Limit Missing Data And Group Selection In Bayesian Group Index Regression, Matthew Carli

Theses and Dissertations

Investigations into the association between chemical exposure and health outcomes are increasingly focused on the role of chemical mixtures, as opposed to individual chemicals. The analysis of chemical mixture data required the development of novel statistical methods, one of these being Bayesian group index regression. A statistical challenge common to all chemical mixture analyses is the ubiquitous presence of below detection limit (BDL) data. We propose an extension of Bayesian group index regression that treats both regression effects and missing BDL observations as parameters in a model estimated through a Markov Chain Monte Carlo algorithm that we refer to as …


Variability In Causal Effects On A Binary Outcome And Noncompliance In A Multisite Randomized Trial, Xinxin Sun Jan 2023

Variability In Causal Effects On A Binary Outcome And Noncompliance In A Multisite Randomized Trial, Xinxin Sun

Theses and Dissertations

Noncompliance to treatment assignment is widespread in randomized trials and presents challenges in causal inference. In the presence of noncompliance, the most commonly estimated effect of treatment assignment, also known as intent-to-treat (ITT) effect, is biased. Of interest in this setting is the complier average causal effect (CACE), the ITT effect among compliers. Further complication arises when the outcome variable is partially observed.

My research focuses on estimating the distribution of a site-specific CACE in a multisite randomized controlled trial (MRCT) by maximum likelihood (ML). Assuming compliance missing at random (MAR). We express the likelihood as an integral with respect …


Dynamics Of Redox-Driven Molecular Processes In Local And Systemic Plant Immunity, Philip Berg Dec 2022

Dynamics Of Redox-Driven Molecular Processes In Local And Systemic Plant Immunity, Philip Berg

Theses and Dissertations

The work here presents two main parts. In the first part, chapters 1 – 3 focus on dynamical systems modeling in plant immunity, whereas chapters 4 – 6 describe contributions to computational modeling and analysis of proteomics and genomics data. Chapter 1 investigates dynamical and biochemical patterns of reversibly oxidized cysteines (RevOxCys) during effector-triggered immunity (ETI) in Arabidopsis, examines the regulatory patterns associated with Arabidopsis thimet oligopeptidase 1 and 2’s (TOP1 and TOP2), roles in the RevOxCys events during ETI, and analyzes the redox phenotype of the top1top2 mutant. The second chapter investigates the peptidome dynamics during ETI …


Towards Structured Planning And Learning At The State Fisheries Agency Scale, Caleb A. Aldridge Dec 2022

Towards Structured Planning And Learning At The State Fisheries Agency Scale, Caleb A. Aldridge

Theses and Dissertations

Inland recreational fisheries has grown philosophically and scientifically to consider economic and sociopolitical aspects (non-biological) in addition to the biological. However, integrating biological and non-biological aspects of inland fisheries has been challenging. Thus, an opportunity exists to develop approaches and tools which operationalize planning and decision-making processes which include biological and non-biological aspects of a fishery. This dissertation expands the idea that a core set of goals and objectives is shared among and within inland fisheries agencies; that many routine operations of inland fisheries managers can be regimented or standardized; and the novel concept that current information and operations can …


Modified Em Algorithm In Smcure Package Based On Proportional Hazards Mixture Cure Model With Offset Terms, Jiaying Yi Jul 2022

Modified Em Algorithm In Smcure Package Based On Proportional Hazards Mixture Cure Model With Offset Terms, Jiaying Yi

Theses and Dissertations

Mixture cure model is a useful method of survival analysis for population including cured proportion and uncured proportion. The R package SMCURE applies EM algorithm to estimate the coefficients of covariates in the mixture cure model. Although an offset term is specified in the SMCURE statement, the offset term is not appropriately handled in the algorithm. This thesis aims to adjust the EM algorithm for the proportional hazards mixture cure model in the SMCURE package. In addition, the offset term can be specified separately in the incidence part or the latency part. The numerical experiments include simulation study and real …


Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee Jan 2022

Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee

Theses and Dissertations

Primary Care is on the frontlines of healthcare, thus they see the most diverse set of patients. In order to achieve high functioning primary care, a practice must establish empanelment, the pairing of patients to providers. Enumeration of empanelment, or estimating panel sizes, helps ensure that the demands of the patients demand the supply of providers and optimize the balance of primary care resources to improve quality of care. Further we can adjust panel sizes by using patient-level data on healthcare utilization and complexity extracted from the electronic medial record to determine the amount of care or burden of work …


Marginally Interpretable Models And Multilevel Models For Quantile Regression With Random-Effects, Nahid Sultana Sumi Oct 2021

Marginally Interpretable Models And Multilevel Models For Quantile Regression With Random-Effects, Nahid Sultana Sumi

Theses and Dissertations

The quantile regression model is an active area of statistical research that has received a lot of attention. This complements the most widely used statistical tool, that is, mean regression analysis. Quantile regression analysis It has become more flexible because of its properties that include no assumption on the distribution of the response variable, equivalent to monotone transformations, and robustness to outliers. However, regression analysis offers methodological challenges if the observations are not independent. Cluster, multilevel, and repeated measures (longitudinal data) designs introduce such dependence. The correlation between observations on the same units or clusters should be accounted for to …


Bayesian Calibration Of The Icrp Zirconium Biokinetic Model And Use Of Canned Priors For The Evaluation Of Bioassay, Thomas Raymond Labone Oct 2021

Bayesian Calibration Of The Icrp Zirconium Biokinetic Model And Use Of Canned Priors For The Evaluation Of Bioassay, Thomas Raymond Labone

Theses and Dissertations

The International Commission on Radiological Protection (ICRP) publishes biokinetic models that relate measurements of radioactive material in the body and excreta (bioassay) to the amount of the material taken into the body (intake). Given the intake and the biokinetic model, radiation dose to organs and tissues can be calculated. The ICRP approximates the biokinetics of radioactive materials in the body with compartmental models expressed mathematically as a system of ordinary differential equations, for which they provide point estimates of the rate constants. Inaccurate estimates of intake and radiation dose can result in cases where the biokinetics of an individual differ …


Association Between The Beta Band Neural Response And The Behavioral Performance In Aphasic And Neurologically Intact Individuals, Yilun Zhang Oct 2021

Association Between The Beta Band Neural Response And The Behavioral Performance In Aphasic And Neurologically Intact Individuals, Yilun Zhang

Theses and Dissertations

The complex motor act of speech requires integrating linguistic and sensorimotor processes. Sensorimotor interaction mainly supports speech production in the form of state feedback control architecture. While speaking, subjects react to perturbations in the pitch of voice auditory feedback by changing their tone in the opposite direction to pitch-shift stimuli to compensate for the perceived pitch shift. Aphasia is a communication impairment affecting patients’ speaking, understanding, reading, and writing. The present study aims to examine the association between brain neural activity and the ability for speech auditory feedback error correction in both post-stroke aphasia and neurologically intact individuals. There are …


Correcting For Measurement Error In The Outcome When Estimating The Distribution Of Time To Pregnancy With The Current Duration Approach, Nicole Nasrallah Oct 2021

Correcting For Measurement Error In The Outcome When Estimating The Distribution Of Time To Pregnancy With The Current Duration Approach, Nicole Nasrallah

Theses and Dissertations

The current duration approach to modeling time-to-pregnancy (TTP) models the length of pregnancy attempt for women that are currently attempting pregnancy. There is a scarcity of studies, let alone TTP studies, that account for measurement error in the outcome. Previously, the benefits of a piecewise constant model with regards to bias in estimates of the survival function with measurement error and the parametric modelling of TTP was shown. In this thesis, correcting for measurement error in the outcome with the current duration approach is explored through piecewise constant models with log-normal measurement error. Five different methods are compared to determine …


Multiple Frailty Model For Spatially Correlated Interval-Censored, Wanfang Zhang Oct 2021

Multiple Frailty Model For Spatially Correlated Interval-Censored, Wanfang Zhang

Theses and Dissertations

In this paper, we consider the problem of multiple frailty selection for general interval-censored spatial survival data, which often occurs in clinical trials and epidemiological studies. The general interval-censored data is a mixture of left-, right- and interval-censored data. We propose a Bayesian semiparametric approach based on the Cox proportional hazard model, where monotone splines were used for non-parametrical modeling of the cumulative baseline hazards where the variable selection priors were used for frailty selection. A two-stage data augmentation with Poisson latent variables is developed for efficient computation. The approach is evaluated based a simulation study and illustrated using a …


A Comparison Of Spatial Clustering Assessment Methods, Nadeesha Dilhani Vidanapathirana Jul 2021

A Comparison Of Spatial Clustering Assessment Methods, Nadeesha Dilhani Vidanapathirana

Theses and Dissertations

Spatial clustering detection methods are widely used in many fields of research including sociology, epidemiology, ecology, and criminology. The objective of this study is to assess the performance of four spatial clustering detection methods: the average nearest neighbor ratio, Ripley’s K function, local Moran’s I and Getis-Ord Gi* statistics. We conduct a simulation study to evaluate the performance of each method for areal data under different types of spatial dependence and three different areal structures; a 20x20 regular grid, United States counties in six states and Canadian forward sortation areas (FSAs) in three provinces. The results shows that the empirical …


Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo Jul 2021

Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo

Theses and Dissertations

Copy number variation, as a major source of genetic variation in the human genome, are gains or losses of the DNA segments. Copy number variation has gained considerable interest as it plays important roles in human complex diseases. Therefore, accurate detection of CNVs with data generated by modern genotyping technologies, such as SNP array and whole-exome sequencing (WES), comprises a critical step toward a better understanding of disease etiology. However, current statistical methodologies for CNV detection still face analytical challenges due to numerous genetic and technological factors that may lead to spurious findings. First, existing methods assume the independent observations …


A Simulation-Based Study Of Location-Shift Models Under Non-Normal Conditions, Ummay Khayrunnesa Anika Apr 2021

A Simulation-Based Study Of Location-Shift Models Under Non-Normal Conditions, Ummay Khayrunnesa Anika

Theses and Dissertations

In this study, we compare ordinary least squares (OLS), generalized least squares (GLS), M- and quantile regression (QR) estimators for a continuous response variable under different scenarios by conducting a simulation study. We assess the performance of the estimators in terms of bias, average distance, mean squared error, coverage probability, and ratio of estimated standard error and empirical standard deviation. OLS estimator performs the best when the errors are homoscedastic normal or homoscedastic but skewed (exponential) having no outliers. GLS estimator shows good comparative results to QR when the errors are heteroscedastic normal or heteroscedastic heavy-tailed (t-distributed). The most satisfactory …


Methods For Developing A Machine Learning Framework For Precise 3d Domain Boundary Prediction At Base-Level Resolution, Spiro C. Stilianoudakis Jan 2021

Methods For Developing A Machine Learning Framework For Precise 3d Domain Boundary Prediction At Base-Level Resolution, Spiro C. Stilianoudakis

Theses and Dissertations

High-throughput chromosome conformation capture technology (Hi-C) has revealed extensive DNA looping and folding into discrete 3D domains. These include Topologically Associating Domains (TADs) and chromatin loops, the 3D domains critical for cellular processes like gene regulation and cell differentiation. The relatively low resolution of Hi-C data (regions of several kilobases in size) prevents precise mapping of domain boundaries by conventional TAD/loop-callers. However, high resolution genomic annotations associated with boundaries, such as CTCF and members of cohesin complex, suggest a computational approach for precise location of domain boundaries.

We developed preciseTAD, an optimized machine learning framework that leverages a random …


Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao Jan 2021

Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao

Theses and Dissertations

Drug addiction can lead to many health-related problems and social concerns. Functional connectivity obtained from functional magnetic resonance imaging (fMRI) data promotes a variety of fundamental understandings in such association. Due to its complex correlation structure and large dimensionality, the modeling and analysis of the functional connectivity from neuroimage are challenging. By proposing a spatio-temporal model for multi-subject neuroimage data, we incorporate voxel-level spatio-temporal dependencies of whole-brain measurements to improve the accuracy of statistical inference. To tackle large-scale spatio-temporal neuroimage data, we develop a computationally efficient algorithm to estimate the parameters. Our method is used to identify functional connectivity and …


Bayesian Techniques For Relating Genetic Polymorphisms To Diffusion Tensor Images Of Cocaine Users, Tmader Alballa Jan 2021

Bayesian Techniques For Relating Genetic Polymorphisms To Diffusion Tensor Images Of Cocaine Users, Tmader Alballa

Theses and Dissertations

Past investigations utilizing Diffusion Tensor Imaging (DTI) have demonstrated that cocaine use disorder (CUD) yields white matter changes. We proposed three Bayesian techniques in order to explore the relationship between Fractional Anisotropy (FA), genetic data, and years of cocaine use (YCU). CUD participants exhibit abnormality in different areas of the brain versus non-drug using controls, which is measured by DTI. This dissertation is motivated by a neuroimaging genetic study in cocaine dependence, which found that there were relationships between several genes such as GAD and 5-HT2R and CUD subjects.

In the first chapter, there is background on the …


Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong Oct 2020

Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong

Theses and Dissertations

Analyzing population representative datasets for local estimation and predictions over time is important for monitoring related public health issues, however, there are many statistical challenges associated with such analyses. Mixed effect models are one of the common options which can incorporate time and spatial effect in the model and related inference is well established.

In the first part of this dissertation, to estimate area-level prevalence using individuallevel data, small area estimation (SAE) with post-stratified mixed effect models were used where sampling weights were also incorporated into it. However, if poststratification which requires more computation effort can improve estimation accuracy is …


Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis Aug 2020

Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis

Theses and Dissertations

Sepsis is a potentially life-threatening condition characterized by a dysregulated, disproportionate immune response to infection by which the afflicted body attacks its own tissues, sometimes to the point of organ failure, and in the worst cases, death. According to the Centers for Disease Control and Prevention (CDC) Sepsis is reported to kill upwards of 270,000 Americans annually, though this figure may be greater given certain ambiguities in the current accepted diagnostic framework of the disease.

This study attempted to first establish an understanding of past definitions of sepsis, and to then recommend use of machine learning as integral in an …


A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley Jul 2020

A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley

Theses and Dissertations

According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …


Network-Based Statistical Analysis Of Functional Magnetic Resonance Imaging Data From Aphasia Patients, Xingpei Zhao Jul 2020

Network-Based Statistical Analysis Of Functional Magnetic Resonance Imaging Data From Aphasia Patients, Xingpei Zhao

Theses and Dissertations

Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that provides insight into brain function and activity. Network models of fMRI signals can reveal functional connectivity related to certain brain disorders, such as post-stroke aphasia. This thesis aims to identify the functional connections that distinguish anomic and Broca’s aphasia by comparing the resting-state fMRI from the patients with these two types of aphasia. The network-based statistic (NBS) approach is used to detect such connections. After the analytic pipeline is applied to the fMRI data, the NBS approach identifies a distinct subnetwork between the two types of aphasia, which involves the …


The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith Jul 2020

The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith

Theses and Dissertations

The Cox proportional hazards model is the most common regression technique for survival analysis. However, the proportional hazards assumption restricts it’s use to a limited group of multiplicative models. Laplace regression is a flexible quantile regression technique for censored observations that is appropriate in a wider variety of applications as compared to the Cox proportional hazards model. Instead of estimating a hazard ratio, Laplace regression which is free from a proportionality assumption, can be used to estimate many adjusted percentiles of survival time allowing for a more complete description of the association of interest. This paper compares the performance of …


Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang Jul 2020

Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang

Theses and Dissertations

Datasets with a relatively large number of zeros is commonly seen in medical applications. Although models like Zero-inflated Poisson (ZIP) model are proposed for counts data, there is still some issues with ordinal data which have excess zeros. In this paper, we developed a Bayesian approach to accommodate the excess zero in ordinal data. Intellectual disability (ID), also known as mental retardation (MR), is a disability characterized by below-average intelligence or mental ability and a lack of the learning necessary skills for daily life. A person with intellectual disability has intellectual functioning and adaptive behaviors limitations. Intellectual disability is a …


Biomarker Development For Use In Regression Calibration, Yiwen Zhang May 2020

Biomarker Development For Use In Regression Calibration, Yiwen Zhang

Theses and Dissertations

It is challenging to alleviate systematic measurement error in self-reported data when studying the associations between dietary intakes and chronic disease risk. The regression calibration method has been used for this purpose when an objectively measured biomarker that satisfies a classical measurement error assumption is available. The requirement for the biomarkers needs to be quite strong and very few dietary intake biomarkers as such have been developed. Feeding studies provide opportunities to develop such potential biomarkers using regression methods with a much larger variety of dietary variables. However, the measurement error for the resulting biomarkers will be of Berkson type …


Infant Mortality In The United States: Socioeconomic Factors Predicting Infant Survival In Late Neo-Natal And Post Neo-Natal Infants From Birth Certificate Data, Mark Brunk-Grady May 2020

Infant Mortality In The United States: Socioeconomic Factors Predicting Infant Survival In Late Neo-Natal And Post Neo-Natal Infants From Birth Certificate Data, Mark Brunk-Grady

Theses and Dissertations

According to the Centers for Disease Control and Prevention, the infant mortality rate in the United States in 2018 was 5.6 deaths per 1000 live births. Infant mortality is defined as a child being born alive but dying before their first birthday. This study aimed to determine if adding socioeconomic factors to traditional predictive survival models improved the predictive power in terms of survival for late and post neonatal infants. Secondly, this study looked to develop a risk score to and predict which mothers would be classified as “High” or “Low” risk for infant death.

Data were analyzed from a …


Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain Apr 2020

Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain

Theses and Dissertations

The joint modeling of longitudinal and time-to-event data is an active area of statistical research that has received a lot of attention. The standard joint models, referred to as univariate joint models, allow simultaneous modeling of a single longitudinal outcome and a single time-to-event under an assumption of independent censoring. The majority of the joint modeling research in the last two decades has focused on extending and improving the univariate joint models. While many of the practical applications involve data on multivariate longitudinal outcomes and multiple timeto- events possibly informatively censored by some other terminal time-to-event, the developments of joint …