Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 36

Full-Text Articles in Statistical Models

Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan Jan 2023

Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan

Theses and Dissertations--Statistics

For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic parameterization of the CMP has been well-studied, its main drawback is that it is does not directly model the mean of the counts. This is mitigated by using a mean-parameterized version of the CMP distribution. In this work, we are concerned with the setting where count data may be comprised of subpopulations, each possibly having varying degrees of data dispersion. Thus, we propose a finite mixture of mean-parameterized CMP distributions. An …


Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan Jan 2023

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

Theses and Dissertations--Statistics

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …


High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


Deriving The Distributions And Developing Methods Of Inference For R2-Type Measures, With Applications To Big Data Analysis, Gregory S. Hawk Jan 2022

Deriving The Distributions And Developing Methods Of Inference For R2-Type Measures, With Applications To Big Data Analysis, Gregory S. Hawk

Theses and Dissertations--Statistics

As computing capabilities and cloud-enhanced data sharing has accelerated exponentially in the 21st century, our access to Big Data has revolutionized the way we see data around the world, from healthcare to investments to manufacturing to retail and supply-chain. In many areas of research, however, the cost of obtaining each data point makes more than just a few observations impossible. While machine learning and artificial intelligence (AI) are improving our ability to make predictions from datasets, we need better statistical methods to improve our ability to understand and translate models into meaningful and actionable insights.

A central goal in the …


Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi Jan 2022

Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi

Theses and Dissertations--Statistics

This dissertation research is concentrated on the Contaminated Beta(CB) model and its application in micro-array data analysis. Modified Likelihood Ratio Test (MLRT) introduced by [Chen et al., 2001] is used for testing the omnibus null hypothesis of no contamination of Beta(1,1)([Dai and Charnigo, 2008]). We design constraints for two-component CB model, which put the mode toward the left end of the distribution to reflect the abundance of small p-values of micro-array data, to increase the test power. A three-component CB model might be useful when distinguishing high differentially expressed genes and moderate differentially expressed genes. If the null hypothesis above …


Novel Methods For Characterizing Conditional Quantiles In Zero-Inflated Count Regression Models, Xuan Shi Jan 2021

Novel Methods For Characterizing Conditional Quantiles In Zero-Inflated Count Regression Models, Xuan Shi

Theses and Dissertations--Statistics

Despite its popularity in diverse disciplines, quantile regression methods are primarily designed for the continuous response setting and cannot be directly applied to the discrete (or count) response setting. There can also be challenges when modeling count responses, such as the presence of excess zero counts, formally known as zero-inflation. To address the aforementioned challenges, we propose a comprehensive model-aware strategy that synthesizes quantile regression methods with estimation of zero-inflated count regression models. Various competing computational routines are examined, while residual analysis and model selection procedures are included to validate our method. The performance of these methods is characterized through …


Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye Jan 2021

Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye

Theses and Dissertations--Statistics

Clinical trials are often used to assess drug efficacy and safety. Participants are sometimes pre-stratified into different groups by diagnostic tools. However, these diagnostic tools are fallible. The traditional method ignores this problem and assumes the diagnostic devices are perfect. This assumption will lead to inefficient and biased estimators. In this era of personalized medicine and measurement-based care, the issues of bias and efficiency are of paramount importance. Despite the prominence, only few researches evaluated the treatment effect in the presence of misclassifications in some special cases and most others focus on assessing the accuracy of the diagnostic devices. In …


Dimension Reduction Techniques In Regression, Pei Wang Jan 2021

Dimension Reduction Techniques In Regression, Pei Wang

Theses and Dissertations--Statistics

Because of the advances of modern technology, the size of the collected data nowadays is larger and the structure is more complex. To deal with such kinds of data, sufficient dimension reduction (SDR) and reduced rank (RR) regression are two powerful tools. This dissertation focuses on these two tools and it is composed of three projects. In the first project, we introduce a new SDR method through a novel approach of feature filter to recover the central mean subspace exhaustively along with a method to determine the dimension, two variable selection methods, and extensions to multivariate response and large p …


Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …


Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li Jan 2020

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Comparing the distribution of biomarker measurements between two groups under either an unpaired or paired design is a common goal in many biomarker studies. However, analyzing biomarker data is sometimes challenging because the data may not be normally distributed and contain a large fraction of zero values or missing values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We proposed a novel two-part semiparametric method for data under an unpaired setting and a nonparametric method for data under a paired setting. The semiparametric method considers a two-part model, a logistic regression for …


Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang Jan 2020

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang

Theses and Dissertations--Statistics

Kinetic modeling of the time dependence of metabolite concentrations including the unstable isotope labeled species is an important approach to simulate metabolic pathway dynamics. It is also essential for quantitative metabolic flux analysis using tracer data. However, as the metabolic networks are complex including extensive compartmentation and interconnections, the parameter estimation for enzymes that catalyze individual reactions needed for kinetic modeling is challenging. As the pa- rameter space is large and multi-dimensional while kinetic data are comparatively sparse, the estimation procedure (especially the point estimation methods) often en- counters multiple local maximum such that standard maximum likelihood methods may yield …


Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu Jan 2020

Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu

Theses and Dissertations--Statistics

The Bayesian adjustment for confounding (BAC) is a Bayesian model averaging method to select and adjust for confounding factors when evaluating the average causal effect of an exposure on a certain outcome. We extend the BAC method to time-to-event outcomes. Specifically, the posterior distribution of the exposure effect on a time-to-event outcome is calculated as a weighted average of posterior distributions from a number of candidate proportional hazards models, weighing each model by its ability to adjust for confounding factors. The Bayesian Information Criterion based on the partial likelihood is used to compare different models and approximate the Bayes factor. …


Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou Jan 2020

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou

Theses and Dissertations--Statistics

Statistical intervals (e.g., confidence, prediction, or tolerance) are widely used to quantify uncertainty, but complex settings can create challenges to obtain such intervals that possess the desired properties. My thesis will address diverse data settings and approaches that are shown empirically to have good performance. We first introduce a focused treatment on using a single-layer bootstrap calibration to improve the coverage probabilities of two-sided parametric tolerance intervals for non-normal distributions. We then turn to zero-inflated data, which are commonly found in, among other areas, pharmaceutical and quality control applications. However, the inference problem often becomes difficult in the presence of …


A Flexible Zero-Inflated Poisson Regression Model, Eric S. Roemmele Jan 2019

A Flexible Zero-Inflated Poisson Regression Model, Eric S. Roemmele

Theses and Dissertations--Statistics

A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a …


Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng Jan 2019

Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng

Theses and Dissertations--Statistics

The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry.

In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum …


Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos Jan 2019

Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos

Theses and Dissertations--Statistics

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined …


Accounting For Matching Uncertainty In Photographic Identification Studies Of Wild Animals, Amanda R. Ellis Jan 2018

Accounting For Matching Uncertainty In Photographic Identification Studies Of Wild Animals, Amanda R. Ellis

Theses and Dissertations--Statistics

I consider statistical modelling of data gathered by photographic identification in mark-recapture studies and propose a new method that incorporates the inherent uncertainty of photographic identification in the estimation of abundance, survival and recruitment. A hierarchical model is proposed which accepts scores assigned to pairs of photographs by pattern recognition algorithms as data and allows for uncertainty in matching photographs based on these scores. The new models incorporate latent capture histories that are treated as unknown random variables informed by the data, contrasting past models having the capture histories being fixed. The methods properly account for uncertainty in the matching …


The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie Jan 2018

The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie

Theses and Dissertations--Statistics

When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable …


Mixtures-Of-Regressions With Measurement Error, Xiaoqiong Fang Jan 2018

Mixtures-Of-Regressions With Measurement Error, Xiaoqiong Fang

Theses and Dissertations--Statistics

Finite Mixture model has been studied for a long time, however, traditional methods assume that the variables are measured without error. Mixtures-of-regression model with measurement error imposes challenges to the statisticians, since both the mixture structure and the existence of measurement error can lead to inconsistent estimate for the regression coefficients. In order to solve the inconsistency, We propose series of methods to estimate the mixture likelihood of the mixtures-of-regressions model when there is measurement error, both in the responses and predictors. Different estimators of the parameters are derived and compared with respect to their relative efficiencies. The simulation results …


Estimation In Partially Linear Models With Correlated Observations And Change-Point Models, Liangdong Fan Jan 2018

Estimation In Partially Linear Models With Correlated Observations And Change-Point Models, Liangdong Fan

Theses and Dissertations--Statistics

Methods of estimating parametric and nonparametric components, as well as properties of the corresponding estimators, have been examined in partially linear models by Wahba [1987], Green et al. [1985], Engle et al. [1986], Speckman [1988], Hu et al. [2004], Charnigo et al. [2015] among others. These models are appealing due to their flexibility and wide range of practical applications including the electricity usage study by Engle et al. [1986], gum disease study by Speckman [1988], etc., wherea parametric component explains linear trends and a nonparametric part captures nonlinear relationships.

The compound estimator (Charnigo et al. [2015]) has been used to …


Improving The Computational Efficiency In Bayesian Fitting Of Cormack-Jolly-Seber Models With Individual, Continuous, Time-Varying Covariates, Woodrow Burchett Jan 2017

Improving The Computational Efficiency In Bayesian Fitting Of Cormack-Jolly-Seber Models With Individual, Continuous, Time-Varying Covariates, Woodrow Burchett

Theses and Dissertations--Statistics

The extension of the CJS model to include individual, continuous, time-varying covariates relies on the estimation of covariate values on occasions on which individuals were not captured. Fitting this model in a Bayesian framework typically involves the implementation of a Markov chain Monte Carlo (MCMC) algorithm, such as a Gibbs sampler, to sample from the posterior distribution. For large data sets with many missing covariate values that must be estimated, this creates a computational issue, as each iteration of the MCMC algorithm requires sampling from the full conditional distributions of each missing covariate value. This dissertation examines two solutions to …


Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu Jan 2017

Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu

Theses and Dissertations--Statistics

Firstly, we reviewed some popular nonparameteric regression methods during the past several decades. Then we extended the compound estimation (Charnigo and Srinivasan [2011]) to adapt random design points and heteroskedasticity and proposed a modified Cp criteria for tuning parameter selection. Moreover, we developed a DCp criteria for tuning paramter selection problem in general nonparametric derivative estimation. This extends GCp criteria in Charnigo, Hall and Srinivasan [2011] with random design points and heteroskedasticity. Next, we proposed a change point detection method via compound estimation for both fixed design and random design case, the adaptation of heteroskedasticity was considered for the method. …


Topics In Logistic Regression Analysis, Zhiheng Xie Jan 2016

Topics In Logistic Regression Analysis, Zhiheng Xie

Theses and Dissertations--Statistics

Discrete-time Markov chains have been used to analyze the transition of subjects from intact cognition to dementia with mild cognitive impairment and global impairment as intervening transient states, and death as competing risk. A multinomial logistic regression model is used to estimate the probability distribution in each row of the one-step transition matrix that correspond to the transient states. We investigate some goodness of fit tests for a multinomial distribution with covariates to assess the fit of this model to the data. We propose a modified chi-square test statistic and a score test statistic for the multinomial assumption in each …


Multi-State Models With Missing Covariates, Wenjie Lou Jan 2016

Multi-State Models With Missing Covariates, Wenjie Lou

Theses and Dissertations--Statistics

Multi-state models have been widely used to analyze longitudinal event history data obtained in medical studies. The tools and methods developed recently in this area require the complete observed datasets. While, in many applications measurements on certain components of the covariate vector are missing on some study subjects. In this dissertation, several likelihood-based methodologies were proposed to deal with datasets with different types of missing covariates efficiently when applying multi-state models.

Firstly, a maximum observed data likelihood method was proposed when the data has a univariate missing pattern and the missing covariate is a categorical variable. The construction of the …


Developing An Alternative Way To Analyze Nanostring Data, Shu Shen Jan 2016

Developing An Alternative Way To Analyze Nanostring Data, Shu Shen

Theses and Dissertations--Statistics

Nanostring technology provides a new method to measure gene expressions. It's more sensitive than microarrays and able to do more gene measurements than RT-PCR with similar sensitivity. This system produces counts for each target gene and tabulates them. Counts can be normalized by using an Excel macro or nSolver before analysis. Both methods rely on data normalization prior to statistical analysis to identify differentially expressed genes. Alternatively, we propose to model gene expressions as a function of positive controls and reference gene measurements. Simulations and examples are used to compare this model with Nanostring normalization methods. The results show that …


Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi Jan 2016

Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi

Theses and Dissertations--Statistics

In this dissertation, first we consider the problem of testing homogeneity and order in a contaminated normal model, when the data is correlated under some known covariance structure. To address this problem, we developed a moment based homogeneity and order test, and design weights for test statistics to increase power for homogeneity test. We applied our test to microarray about Down’s syndrome. This dissertation also studies a singular Bayesian information criterion (sBIC) for a bivariate hierarchical mixture model with varying weights, and develops a new data dependent information criterion (sFLIC).We apply our model and criteria to birth- weight and gestational …


Continuous Time Multi-State Models For Interval Censored Data, Lijie Wan Jan 2016

Continuous Time Multi-State Models For Interval Censored Data, Lijie Wan

Theses and Dissertations--Statistics

Continuous-time multi-state models are widely used in modeling longitudinal data of disease processes with multiple transient states, yet the analysis is complex when subjects are observed periodically, resulting in interval censored data. Recently, most studies focused on modeling the true disease progression as a discrete time stationary Markov chain, and only a few studies have been carried out regarding non-homogenous multi-state models in the presence of interval-censored data. In this dissertation, several likelihood-based methodologies were proposed to deal with interval censored data in multi-state models.

Firstly, a continuous time version of a homogenous Markov multi-state model with backward transitions was …


Statistical Inference On Dynamical Systems, Hongyuan Wang Jan 2016

Statistical Inference On Dynamical Systems, Hongyuan Wang

Theses and Dissertations--Statistics

The ordinary differential equation (ODE) is one representative and popular tool in modeling dynamical systems, which are widely implemented in physics, biology, economics, chemistry and biomedical sciences, etc. Because of the importance of dynamical systems in scientific studies, they are the main focuses of my dissertation.

The first chapter of the dissertation is introduction and literature review, which mainly focuses on numerical integration algorithms of ODEs that are difficult to solve analytically, as well as derivative-free optimization algorithms for the so-called inverse problem.

The second chapter is on the estimation method based on numerical solvers of differential equations. We start …


Statistical Methods For Environmental Exposure Data Subject To Detection Limits, Yuchen Yang Jan 2016

Statistical Methods For Environmental Exposure Data Subject To Detection Limits, Yuchen Yang

Theses and Dissertations--Statistics

In this dissertation, we develop unified and efficient nonparametric statistical methods for estimating and comparing environmental exposure distributions in presence of detection limits. In the first part, we propose a kernel-smoothed nonparametric estimator for the exposure distribution without imposing any independence assumption between the exposure level and detection limit. We show that the proposed estimator is consistent and asymptotically normal. Simulation studies demonstrate that the proposed estimator performs well in practical situations. A colon cancer study is provided for illustration. In the second part, we develop a class of test statistics to compare exposure distributions between two groups by using …


Improved Models For Differential Analysis For Genomic Data, Hong Wang Jan 2016

Improved Models For Differential Analysis For Genomic Data, Hong Wang

Theses and Dissertations--Statistics

This paper intend to develop novel statistical methods to improve genomic data analysis, especially for differential analysis. We considered two different data type: NanoString nCounter data and somatic mutation data. For NanoString nCounter data, we develop a novel differential expression detection method. The method considers a generalized linear model of the negative binomial family to characterize count data and allows for multi-factor design. Data normalization is incorporated in the model framework through data normalization parameters, which are estimated from control genes embedded in the nCounter system. For somatic mutation data, we develop beta-binomial model-based approaches to identify highly or lowly …