Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Kentucky

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 50

Full-Text Articles in Statistical Models

Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan Jan 2023

Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan

Theses and Dissertations--Statistics

For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic parameterization of the CMP has been well-studied, its main drawback is that it is does not directly model the mean of the counts. This is mitigated by using a mean-parameterized version of the CMP distribution. In this work, we are concerned with the setting where count data may be comprised of subpopulations, each possibly having varying degrees of data dispersion. Thus, we propose a finite mixture of mean-parameterized CMP distributions. An …


Potential Alzheimer's Disease Plasma Biomarkers, Taylor Estepp Jan 2023

Potential Alzheimer's Disease Plasma Biomarkers, Taylor Estepp

Theses and Dissertations--Epidemiology and Biostatistics

In this series of studies, we examined the potential of a variety of blood-based plasma biomarkers for the identification of Alzheimer's disease (AD) progression and cognitive decline. With the end goal of studying these biomarkers via mixture modeling, we began with a literature review of the methodology. An examination of the biomarkers with demographics and other health factors found evidence of minimal risk of confounding along the causal pathway from biomarkers to cognitive performance. Further study examined the usefulness of linear combinations of biomarkers, achieved via partial least squares (PLS) analysis, as predictors of various cognitive assessment scores and clinical …


Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan Jan 2023

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

Theses and Dissertations--Statistics

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …


High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


Deriving The Distributions And Developing Methods Of Inference For R2-Type Measures, With Applications To Big Data Analysis, Gregory S. Hawk Jan 2022

Deriving The Distributions And Developing Methods Of Inference For R2-Type Measures, With Applications To Big Data Analysis, Gregory S. Hawk

Theses and Dissertations--Statistics

As computing capabilities and cloud-enhanced data sharing has accelerated exponentially in the 21st century, our access to Big Data has revolutionized the way we see data around the world, from healthcare to investments to manufacturing to retail and supply-chain. In many areas of research, however, the cost of obtaining each data point makes more than just a few observations impossible. While machine learning and artificial intelligence (AI) are improving our ability to make predictions from datasets, we need better statistical methods to improve our ability to understand and translate models into meaningful and actionable insights.

A central goal in the …


Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi Jan 2022

Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi

Theses and Dissertations--Statistics

This dissertation research is concentrated on the Contaminated Beta(CB) model and its application in micro-array data analysis. Modified Likelihood Ratio Test (MLRT) introduced by [Chen et al., 2001] is used for testing the omnibus null hypothesis of no contamination of Beta(1,1)([Dai and Charnigo, 2008]). We design constraints for two-component CB model, which put the mode toward the left end of the distribution to reflect the abundance of small p-values of micro-array data, to increase the test power. A three-component CB model might be useful when distinguishing high differentially expressed genes and moderate differentially expressed genes. If the null hypothesis above …


Novel Methods For Characterizing Conditional Quantiles In Zero-Inflated Count Regression Models, Xuan Shi Jan 2021

Novel Methods For Characterizing Conditional Quantiles In Zero-Inflated Count Regression Models, Xuan Shi

Theses and Dissertations--Statistics

Despite its popularity in diverse disciplines, quantile regression methods are primarily designed for the continuous response setting and cannot be directly applied to the discrete (or count) response setting. There can also be challenges when modeling count responses, such as the presence of excess zero counts, formally known as zero-inflation. To address the aforementioned challenges, we propose a comprehensive model-aware strategy that synthesizes quantile regression methods with estimation of zero-inflated count regression models. Various competing computational routines are examined, while residual analysis and model selection procedures are included to validate our method. The performance of these methods is characterized through …


Sexual Behaviors Associated With Online Partner-Seeking Among Men Who Have Sex With Men From Small/Midsized Towns Or Rural Areas In Kentucky, Vira Pravosud Jan 2021

Sexual Behaviors Associated With Online Partner-Seeking Among Men Who Have Sex With Men From Small/Midsized Towns Or Rural Areas In Kentucky, Vira Pravosud

Theses and Dissertations--Epidemiology and Biostatistics

The HIV epidemic remains one of the most significant public health issues in the United States, particularly among men who have sex with men (MSM). New avenues for partner-seeking have emerged over the past three decades, including through the Internet, social media, and geosocial networking applications. Consisting of three cross-sectional studies, this dissertation research aimed to determine associations between the use of various online tools for partner-seeking (hereafter collectively referred to as “apps”) and HIV-related sexual behaviors among 252 young adult MSM residing in small/midsized towns or rural areas in Central Kentucky, a group that has been under-represented in the …


Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye Jan 2021

Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye

Theses and Dissertations--Statistics

Clinical trials are often used to assess drug efficacy and safety. Participants are sometimes pre-stratified into different groups by diagnostic tools. However, these diagnostic tools are fallible. The traditional method ignores this problem and assumes the diagnostic devices are perfect. This assumption will lead to inefficient and biased estimators. In this era of personalized medicine and measurement-based care, the issues of bias and efficiency are of paramount importance. Despite the prominence, only few researches evaluated the treatment effect in the presence of misclassifications in some special cases and most others focus on assessing the accuracy of the diagnostic devices. In …


Dimension Reduction Techniques In Regression, Pei Wang Jan 2021

Dimension Reduction Techniques In Regression, Pei Wang

Theses and Dissertations--Statistics

Because of the advances of modern technology, the size of the collected data nowadays is larger and the structure is more complex. To deal with such kinds of data, sufficient dimension reduction (SDR) and reduced rank (RR) regression are two powerful tools. This dissertation focuses on these two tools and it is composed of three projects. In the first project, we introduce a new SDR method through a novel approach of feature filter to recover the central mean subspace exhaustively along with a method to determine the dimension, two variable selection methods, and extensions to multivariate response and large p …


Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …


Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li Jan 2020

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Comparing the distribution of biomarker measurements between two groups under either an unpaired or paired design is a common goal in many biomarker studies. However, analyzing biomarker data is sometimes challenging because the data may not be normally distributed and contain a large fraction of zero values or missing values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We proposed a novel two-part semiparametric method for data under an unpaired setting and a nonparametric method for data under a paired setting. The semiparametric method considers a two-part model, a logistic regression for …


Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang Jan 2020

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang

Theses and Dissertations--Statistics

Kinetic modeling of the time dependence of metabolite concentrations including the unstable isotope labeled species is an important approach to simulate metabolic pathway dynamics. It is also essential for quantitative metabolic flux analysis using tracer data. However, as the metabolic networks are complex including extensive compartmentation and interconnections, the parameter estimation for enzymes that catalyze individual reactions needed for kinetic modeling is challenging. As the pa- rameter space is large and multi-dimensional while kinetic data are comparatively sparse, the estimation procedure (especially the point estimation methods) often en- counters multiple local maximum such that standard maximum likelihood methods may yield …


Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu Jan 2020

Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu

Theses and Dissertations--Statistics

The Bayesian adjustment for confounding (BAC) is a Bayesian model averaging method to select and adjust for confounding factors when evaluating the average causal effect of an exposure on a certain outcome. We extend the BAC method to time-to-event outcomes. Specifically, the posterior distribution of the exposure effect on a time-to-event outcome is calculated as a weighted average of posterior distributions from a number of candidate proportional hazards models, weighing each model by its ability to adjust for confounding factors. The Bayesian Information Criterion based on the partial likelihood is used to compare different models and approximate the Bayes factor. …


Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou Jan 2020

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou

Theses and Dissertations--Statistics

Statistical intervals (e.g., confidence, prediction, or tolerance) are widely used to quantify uncertainty, but complex settings can create challenges to obtain such intervals that possess the desired properties. My thesis will address diverse data settings and approaches that are shown empirically to have good performance. We first introduce a focused treatment on using a single-layer bootstrap calibration to improve the coverage probabilities of two-sided parametric tolerance intervals for non-normal distributions. We then turn to zero-inflated data, which are commonly found in, among other areas, pharmaceutical and quality control applications. However, the inference problem often becomes difficult in the presence of …


The Use Of 3-D Highway Differential Geometry In Crash Prediction Modeling, Kiriakos Amiridis Jan 2019

The Use Of 3-D Highway Differential Geometry In Crash Prediction Modeling, Kiriakos Amiridis

Theses and Dissertations--Civil Engineering

The objective of this research is to evaluate and introduce a new methodology regarding rural highway safety. Current practices rely on crash prediction models that utilize specific explanatory variables, whereas the depository of knowledge for past research is the Highway Safety Manual (HSM). Most of the prediction models in the HSM identify the effect of individual geometric elements on crash occurrence and consider their combination in a multiplicative manner, where each effect is multiplied with others to determine their combined influence. The concepts of 3-dimesnional (3-D) representation of the roadway surface have also been explored in the past aiming to …


Automatic 13C Chemical Shift Reference Correction Of Protein Nmr Spectral Data Using Data Mining And Bayesian Statistical Modeling, Xi Chen Jan 2019

Automatic 13C Chemical Shift Reference Correction Of Protein Nmr Spectral Data Using Data Mining And Bayesian Statistical Modeling, Xi Chen

Theses and Dissertations--Molecular and Cellular Biochemistry

Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially of biomacromolecules such as proteins. However, due to the intrinsic properties of NMR experiments, results from the NMR instruments require a refencing step before the down-the-line analysis. Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamentally limits and even prevents effective study of biomacromolecules via NMR. There is no available method that can rereference carbon chemical shifts from protein NMR without secondary experimental information such as structure or resonance assignment.

To solve this problem, we …


A Flexible Zero-Inflated Poisson Regression Model, Eric S. Roemmele Jan 2019

A Flexible Zero-Inflated Poisson Regression Model, Eric S. Roemmele

Theses and Dissertations--Statistics

A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a …


Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng Jan 2019

Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng

Theses and Dissertations--Statistics

The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry.

In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum …


Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos Jan 2019

Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos

Theses and Dissertations--Statistics

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined …


Improved Methods And Selecting Classification Types For Time-Dependent Covariates In The Marginal Analysis Of Longitudinal Data, I-Chen Chen Jan 2018

Improved Methods And Selecting Classification Types For Time-Dependent Covariates In The Marginal Analysis Of Longitudinal Data, I-Chen Chen

Theses and Dissertations--Epidemiology and Biostatistics

Generalized estimating equations (GEE) are popularly utilized for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, when certain types of time-dependent covariates are presented, these equations can be biased unless an independence working correlation structure is employed. Moreover, in this case regression parameter estimation can be very inefficient because not all valid moment conditions are incorporated within the corresponding estimating equations. Therefore, approaches using the generalized method of moments or quadratic inference functions have been proposed for utilizing all valid moment conditions. However, we have found that …


Accounting For Spatial Autocorrelation In Modeling The Distribution Of Water Quality Variables, Lorrayne Miralha Jan 2018

Accounting For Spatial Autocorrelation In Modeling The Distribution Of Water Quality Variables, Lorrayne Miralha

Theses and Dissertations--Geography

Several studies in hydrology have reported differences in outcomes between models in which spatial autocorrelation (SAC) is accounted for and those in which SAC is not. However, the capacity to predict the magnitude of such differences is still ambiguous. In this thesis, I hypothesized that SAC, inherently possessed by a response variable, influences spatial modeling outcomes. I selected ten watersheds in the USA and analyzed them to determine whether water quality variables with higher Moran’s I values undergo greater increases in the coefficient of determination (R²) and greater decreases in residual SAC (rSAC) after spatial modeling. I compared non-spatial ordinary …


Effect Of Socioeconomic And Demographic Factors On Kentucky Crashes, Aaron Berry Cambron Jan 2018

Effect Of Socioeconomic And Demographic Factors On Kentucky Crashes, Aaron Berry Cambron

Theses and Dissertations--Civil Engineering

The goal of this research was to examine the potential predictive ability of socioeconomic and demographic data for drivers on Kentucky crash occurrence. Identifying unique background characteristics of at-fault drivers that contribute to crash rates and crash severity may lead to improved and more specific interventions to reduce the negative impacts of motor vehicle crashes. The driver-residence zip code was used as a spatial unit to connect five years of Kentucky crash data with socioeconomic factors from the U.S. Census, such as income, employment, education, age, and others, along with terrain and vehicle age. At-fault driver crash counts, normalized over …


Accounting For Matching Uncertainty In Photographic Identification Studies Of Wild Animals, Amanda R. Ellis Jan 2018

Accounting For Matching Uncertainty In Photographic Identification Studies Of Wild Animals, Amanda R. Ellis

Theses and Dissertations--Statistics

I consider statistical modelling of data gathered by photographic identification in mark-recapture studies and propose a new method that incorporates the inherent uncertainty of photographic identification in the estimation of abundance, survival and recruitment. A hierarchical model is proposed which accepts scores assigned to pairs of photographs by pattern recognition algorithms as data and allows for uncertainty in matching photographs based on these scores. The new models incorporate latent capture histories that are treated as unknown random variables informed by the data, contrasting past models having the capture histories being fixed. The methods properly account for uncertainty in the matching …


The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie Jan 2018

The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie

Theses and Dissertations--Statistics

When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable …


Mixtures-Of-Regressions With Measurement Error, Xiaoqiong Fang Jan 2018

Mixtures-Of-Regressions With Measurement Error, Xiaoqiong Fang

Theses and Dissertations--Statistics

Finite Mixture model has been studied for a long time, however, traditional methods assume that the variables are measured without error. Mixtures-of-regression model with measurement error imposes challenges to the statisticians, since both the mixture structure and the existence of measurement error can lead to inconsistent estimate for the regression coefficients. In order to solve the inconsistency, We propose series of methods to estimate the mixture likelihood of the mixtures-of-regressions model when there is measurement error, both in the responses and predictors. Different estimators of the parameters are derived and compared with respect to their relative efficiencies. The simulation results …


Estimation In Partially Linear Models With Correlated Observations And Change-Point Models, Liangdong Fan Jan 2018

Estimation In Partially Linear Models With Correlated Observations And Change-Point Models, Liangdong Fan

Theses and Dissertations--Statistics

Methods of estimating parametric and nonparametric components, as well as properties of the corresponding estimators, have been examined in partially linear models by Wahba [1987], Green et al. [1985], Engle et al. [1986], Speckman [1988], Hu et al. [2004], Charnigo et al. [2015] among others. These models are appealing due to their flexibility and wide range of practical applications including the electricity usage study by Engle et al. [1986], gum disease study by Speckman [1988], etc., wherea parametric component explains linear trends and a nonparametric part captures nonlinear relationships.

The compound estimator (Charnigo et al. [2015]) has been used to …


Some Dimension Reduction Strategies For The Analysis Of Survey Data, Jiaying Weng, Derek S. Young Dec 2017

Some Dimension Reduction Strategies For The Analysis Of Survey Data, Jiaying Weng, Derek S. Young

Statistics Faculty Publications

In the era of big data, researchers interested in developing statistical models are challenged with how to achieve parsimony. Usually, some sort of dimension reduction strategy is employed. Classic strategies are often in the form of traditional inference procedures, such as hypothesis testing; however, the increase in computing capabilities has led to the development of more sophisticated methods. In particular, sufficient dimension reduction has emerged as an area of broad and current interest. While these types of dimension reduction strategies have been employed for numerous data problems, they are scantly discussed in the context of analyzing survey data. This …


An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert Jan 2017

An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert

Theses and Dissertations--Epidemiology and Biostatistics

It is estimated that Periodontal Diseases effects up to 90% of the adult population. Given the complexity of the host environment, many factors contribute to expression of the disease. Age, Gender, Socioeconomic Status, Smoking Status, and Race/Ethnicity are all known risk factors, as well as a handful of known comorbidities. Certain vitamins and minerals have been shown to be protective for the disease, while some toxins and chemicals have been associated with an increased prevalence. The role of toxins, chemicals, vitamins, and minerals in relation to disease is believed to be complex and potentially modified by known risk factors. A …


Improving The Computational Efficiency In Bayesian Fitting Of Cormack-Jolly-Seber Models With Individual, Continuous, Time-Varying Covariates, Woodrow Burchett Jan 2017

Improving The Computational Efficiency In Bayesian Fitting Of Cormack-Jolly-Seber Models With Individual, Continuous, Time-Varying Covariates, Woodrow Burchett

Theses and Dissertations--Statistics

The extension of the CJS model to include individual, continuous, time-varying covariates relies on the estimation of covariate values on occasions on which individuals were not captured. Fitting this model in a Bayesian framework typically involves the implementation of a Markov chain Monte Carlo (MCMC) algorithm, such as a Gibbs sampler, to sample from the posterior distribution. For large data sets with many missing covariate values that must be estimated, this creates a computational issue, as each iteration of the MCMC algorithm requires sampling from the full conditional distributions of each missing covariate value. This dissertation examines two solutions to …