Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Statistics and Probability

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


A Flexible Zero-Inflated Poisson Regression Model, Eric S. Roemmele Jan 2019

A Flexible Zero-Inflated Poisson Regression Model, Eric S. Roemmele

Theses and Dissertations--Statistics

A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a …


The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie Jan 2018

The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie

Theses and Dissertations--Statistics

When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable …


Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan Jan 2017

Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan

Theses and Dissertations--Statistics

We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.

We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses …


Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu Jan 2017

Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu

Theses and Dissertations--Statistics

Firstly, we reviewed some popular nonparameteric regression methods during the past several decades. Then we extended the compound estimation (Charnigo and Srinivasan [2011]) to adapt random design points and heteroskedasticity and proposed a modified Cp criteria for tuning parameter selection. Moreover, we developed a DCp criteria for tuning paramter selection problem in general nonparametric derivative estimation. This extends GCp criteria in Charnigo, Hall and Srinivasan [2011] with random design points and heteroskedasticity. Next, we proposed a change point detection method via compound estimation for both fixed design and random design case, the adaptation of heteroskedasticity was considered for the method. …


Empirical Likelihood And Differentiable Functionals, Zhiyuan Shen Jan 2016

Empirical Likelihood And Differentiable Functionals, Zhiyuan Shen

Theses and Dissertations--Statistics

Empirical likelihood (EL) is a recently developed nonparametric method of statistical inference. It has been shown by Owen (1988,1990) and many others that empirical likelihood ratio (ELR) method can be used to produce nice confidence intervals or regions. Owen (1988) shows that -2logELR converges to a chi-square distribution with one degree of freedom subject to a linear statistical functional in terms of distribution functions. However, a generalization of Owen's result to the right censored data setting is difficult since no explicit maximization can be obtained under constraint in terms of distribution functions. Pan and Zhou (2002), instead, study the …


Aggregated Quantitative Multifactor Dimensionality Reduction, Rebecca E. Crouch Jan 2016

Aggregated Quantitative Multifactor Dimensionality Reduction, Rebecca E. Crouch

Theses and Dissertations--Statistics

We consider the problem of making predictions for quantitative phenotypes based on gene-to-gene interactions among selected Single Nucleotide Polymorphisms (SNPs). Previously, Quantitative Multifactor Dimensionality Reduction (QMDR) has been applied to detect gene-to-gene interactions associated with elevated quantitative phenotypes, by creating a dichotomous predictor from one interaction which has been deemed optimal. We propose an Aggregated Quantitative Multifactor Dimensionality Reduction (AQMDR), which exhaustively considers all k-way interactions among a set of SNPs and replaces the dichotomous predictor from QMDR with a continuous aggregated score. We evaluate this new AQMDR method in a series of simulations for two-way and three-way interactions, …


Analysis Of Spatial Data, Xiang Zhang Jan 2013

Analysis Of Spatial Data, Xiang Zhang

Theses and Dissertations--Statistics

In many areas of the agriculture, biological, physical and social sciences, spatial lattice data are becoming increasingly common. In addition, a large amount of lattice data shows not only visible spatial pattern but also temporal pattern (see, Zhu et al. 2005). An interesting problem is to develop a model to systematically model the relationship between the response variable and possible explanatory variable, while accounting for space and time effect simultaneously.

Spatial-temporal linear model and the corresponding likelihood-based statistical inference are important tools for the analysis of spatial-temporal lattice data. We propose a general asymptotic framework for spatial-temporal linear models and …


Parametric Estimation In Competing Risks And Multi-State Models, Yushun Lin Jan 2011

Parametric Estimation In Competing Risks And Multi-State Models, Yushun Lin

Theses and Dissertations--Statistics

The typical research of Alzheimer's disease includes a series of cognitive states. Multi-state models are often used to describe the history of disease evolvement. Competing risks models are a sub-category of multi-state models with one starting state and several absorbing states.

Analyses for competing risks data in medical papers frequently assume independent risks and evaluate covariate effects on these events by modeling distinct proportional hazards regression models for each event. Jeong and Fine (2007) proposed a parametric proportional sub-distribution hazard (SH) model for cumulative incidence functions (CIF) without assumptions about the dependence among the risks. We modified their model to …