Open Access. Powered by Scholars. Published by Universities.®

Biostatistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Biostatistics

The Performance Of Marginal Modeling Methods For Rare Events With Application To Opioid Overdose Mortality And Morbidity, Shawn Nigam Jan 2024

The Performance Of Marginal Modeling Methods For Rare Events With Application To Opioid Overdose Mortality And Morbidity, Shawn Nigam

Theses and Dissertations--Epidemiology and Biostatistics

Opioid misuse is a nationwide epidemic, with Kentucky having one of the highest opioid overdose-related fatality rates across all US states. These rates have increased significantly over the past decade, with particularly large increases during the COVID-19 pandemic. This dissertation aims to study the behavior of these increases and the methods for the marginal modeling of count outcomes related to opioid overdose.

Opioid overdose-related fatality rates in Kentucky increased significantly during the COVID-19 pandemic. In this chapter, we characterize the changes in opioid overdose fatality rates in Kentucky and identify associations between potential factors and fatality rates. County-level opioid overdose …


High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su Jan 2022

Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su

Theses and Dissertations--Statistics

When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …


Investigations Into The Genetics Of Mixed Pathologies In Dementia, Adam Dugan Jan 2021

Investigations Into The Genetics Of Mixed Pathologies In Dementia, Adam Dugan

Theses and Dissertations--Epidemiology and Biostatistics

Alzheimer’s disease (AD) is an irreversible, progressive brain disorder that leads to a loss of memory and thinking skills. While tremendous progress has been made in our understanding of the genetics underlying AD, currently known genetic variants explain only approximately 30% of the heritable risk of developing AD. One hurdle to AD research is that it can only be definitively diagnosed at autopsy, making cruder, clinic-based diagnoses more common. In recent years, several brain pathologies that mimic AD’s clinical presentation have been identified including brain arteriolosclerosis, hippocampal sclerosis (HS), and, most recently, limbic-predominant age-related TDP-43 encephalopathy (LATE). It has become …


Innovative Statistical Models In Cancer Immunotherapy Trial Design, Jing Wei Jan 2021

Innovative Statistical Models In Cancer Immunotherapy Trial Design, Jing Wei

Theses and Dissertations--Statistics

A challenge arising in cancer immunotherapy trial design is the presence of non-proportional hazards (NPH) patterns in survival curves. We considered three different NPH patterns caused by delayed treatment effect, cure rate and responder rate of treatment group in this dissertation. These three NPH patterns would violate the proportional hazard model assumption and ignoring any of them in an immunotherapy trial design will result in substantial loss of statistical power.

In this dissertation, four models to deal with NPH patterns are discussed. First, a piecewise proportional hazards model is proposed to incorporate delayed treatment effect into the trial design consideration. …


Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li Jan 2020

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Comparing the distribution of biomarker measurements between two groups under either an unpaired or paired design is a common goal in many biomarker studies. However, analyzing biomarker data is sometimes challenging because the data may not be normally distributed and contain a large fraction of zero values or missing values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We proposed a novel two-part semiparametric method for data under an unpaired setting and a nonparametric method for data under a paired setting. The semiparametric method considers a two-part model, a logistic regression for …


Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu Jan 2020

Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu

Theses and Dissertations--Statistics

The Bayesian adjustment for confounding (BAC) is a Bayesian model averaging method to select and adjust for confounding factors when evaluating the average causal effect of an exposure on a certain outcome. We extend the BAC method to time-to-event outcomes. Specifically, the posterior distribution of the exposure effect on a time-to-event outcome is calculated as a weighted average of posterior distributions from a number of candidate proportional hazards models, weighing each model by its ability to adjust for confounding factors. The Bayesian Information Criterion based on the partial likelihood is used to compare different models and approximate the Bayes factor. …


Unsupervised Learning In Phylogenomic Analysis Over The Space Of Phylogenetic Trees, Qiwen Kang Jan 2019

Unsupervised Learning In Phylogenomic Analysis Over The Space Of Phylogenetic Trees, Qiwen Kang

Theses and Dissertations--Statistics

A phylogenetic tree is a tree to represent an evolutionary history between species or other entities. Phylogenomics is a new field intersecting phylogenetics and genomics and it is well-known that we need statistical learning methods to handle and analyze a large amount of data which can be generated relatively cheaply with new technologies. Based on the existing Markov models, we introduce a new method, CURatio, to identify outliers in a given gene data set. This method, intrinsically an unsupervised method, can find outliers from thousands or even more genes. This ability to analyze large amounts of genes (even with missing …


Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan Jan 2017

Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan

Theses and Dissertations--Statistics

We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.

We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses …


Improved Models For Differential Analysis For Genomic Data, Hong Wang Jan 2016

Improved Models For Differential Analysis For Genomic Data, Hong Wang

Theses and Dissertations--Statistics

This paper intend to develop novel statistical methods to improve genomic data analysis, especially for differential analysis. We considered two different data type: NanoString nCounter data and somatic mutation data. For NanoString nCounter data, we develop a novel differential expression detection method. The method considers a generalized linear model of the negative binomial family to characterize count data and allows for multi-factor design. Data normalization is incorporated in the model framework through data normalization parameters, which are estimated from control genes embedded in the nCounter system. For somatic mutation data, we develop beta-binomial model-based approaches to identify highly or lowly …


Genetic Association Testing Of Copy Number Variation, Yinglei Li Jan 2014

Genetic Association Testing Of Copy Number Variation, Yinglei Li

Theses and Dissertations--Statistics

Copy-number variation (CNV) has been implicated in many complex diseases. It is of great interest to detect and locate such regions through genetic association testings. However, the association testings are complicated by the fact that CNVs usually span multiple markers and thus such markers are correlated to each other. To overcome the difficulty, it is desirable to pool information across the markers. In this thesis, we propose a kernel-based method for aggregation of marker-level tests, in which first we obtain a bunch of p-values through association tests for every marker and then the association test involving CNV is based on …