Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Sparse Model Selection Using Information Complexity, Yaojin Sun May 2022

Sparse Model Selection Using Information Complexity, Yaojin Sun

Doctoral Dissertations

This dissertation studies and uses the application of information complexity to statistical model selection through three different projects. Specifically, we design statistical models that incorporate sparsity features to make the models more explanatory and computationally efficient.

In the first project, we propose a Sparse Bridge Regression model for variable selection when the number of variables is much greater than the number of observations if model misspecification occurs. The model is demonstrated to have excellent explanatory power in high-dimensional data analysis through numerical simulations and real-world data analysis.

The second project proposes a novel hybrid modeling method that utilizes a mixture …


Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi Jan 2022

Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi

Theses and Dissertations--Statistics

This dissertation research is concentrated on the Contaminated Beta(CB) model and its application in micro-array data analysis. Modified Likelihood Ratio Test (MLRT) introduced by [Chen et al., 2001] is used for testing the omnibus null hypothesis of no contamination of Beta(1,1)([Dai and Charnigo, 2008]). We design constraints for two-component CB model, which put the mode toward the left end of the distribution to reflect the abundance of small p-values of micro-array data, to increase the test power. A three-component CB model might be useful when distinguishing high differentially expressed genes and moderate differentially expressed genes. If the null hypothesis above …


Serial Testing For Detection Of Multilocus Genetic Interactions, Zaid T. Al-Khaledi Jan 2019

Serial Testing For Detection Of Multilocus Genetic Interactions, Zaid T. Al-Khaledi

Theses and Dissertations--Statistics

A method to detect relationships between disease susceptibility and multilocus genetic interactions is the Multifactor-Dimensionality Reduction (MDR) technique pioneered by Ritchie et al. (2001). Since its introduction, many extensions have been pursued to deal with non-binary outcomes and/or account for multiple interactions simultaneously. Studying the effects of multilocus genetic interactions on continuous traits (blood pressure, weight, etc.) is one case that MDR does not handle. Culverhouse et al. (2004) and Gui et al. (2013) proposed two different methods to analyze such a case. In their research, Gui et al. (2013) introduced the Quantitative Multifactor-Dimensionality Reduction (QMDR) that uses the overall …


Examining The Confirmatory Tetrad Analysis (Cta) As A Solution Of The Inadequacy Of Traditional Structural Equation Modeling (Sem) Fit Indices, Hangcheng Liu Jan 2018

Examining The Confirmatory Tetrad Analysis (Cta) As A Solution Of The Inadequacy Of Traditional Structural Equation Modeling (Sem) Fit Indices, Hangcheng Liu

Theses and Dissertations

Structural Equation Modeling (SEM) is a framework of statistical methods that allows us to represent complex relationships between variables. SEM is widely used in economics, genetics and the behavioral sciences (e.g. psychology, psychobiology, sociology and medicine). Model complexity is defined as a model’s ability to fit different data patterns and it plays an important role in model selection when applying SEM. As in linear regression, the number of free model parameters is typically used in traditional SEM model fit indices as a measure of the model complexity. However, only using number of free model parameters to indicate SEM model complexity …


Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias Jul 2017

Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias

Doctoral Dissertations

The ever-increasing complexity of the models used in predictive modeling and data science and their use for prediction and inference has made the development of tools for uncertainty quantification and model selection especially important. In this work, we seek to understand the various trade-offs associated with the simulation of stochastic systems. Some trade-offs are computational, e.g., execution time of an algorithm versus accuracy of simulation. Others are analytical: whether or not we are able to find tractable substitutes for quantities of interest, e.g., distributions, ergodic averages, etc. The first two chapters of this thesis deal with the study of the …


Approximate Statistical Solutions To The Forensic Identification Of Source Problem, Danica M. Ommen Jan 2017

Approximate Statistical Solutions To The Forensic Identification Of Source Problem, Danica M. Ommen

Electronic Theses and Dissertations

Currently in forensic science, the statistical methods for solving the identification of source problems are inherently subjective and generally ad-hoc. The formal Bayesian decision framework provides the most statistically rigorous foundation for these problems to date. However, computing a solution under this framework, which relies on a Bayes Factor, tends to be computationally intensive and highly sensitive to the subjective choice of prior distributions for the parameters. Therefore, this dissertation aims to develop statistical solutions to the forensic identification of source problems which are less subjective, but which retain the statistical rigor of the Bayesian solution. First, this dissertation focuses …


Inference Using Bhattacharyya Distance To Model Interaction Effects When The Number Of Predictors Far Exceeds The Sample Size, Sarah A. Janse Jan 2017

Inference Using Bhattacharyya Distance To Model Interaction Effects When The Number Of Predictors Far Exceeds The Sample Size, Sarah A. Janse

Theses and Dissertations--Statistics

In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that …


What Affects Parents’ Choice Of Milk? An Application Of Bayesian Model Averaging, Yingzhe Cheng Dec 2016

What Affects Parents’ Choice Of Milk? An Application Of Bayesian Model Averaging, Yingzhe Cheng

Mathematics & Statistics ETDs

This study identifies the factors that influence parents’ choice of milk for their children, using data from a unique survey administered in 2013 in Hunan province, China. In this survey, we identified two brands of milk, which differ in their prices and safety claims by the producer. Data were collected on parents’ choice of milk between the two brands, demographics, attitude towards food safety and behaviors related to food. Stepwise model selection and Bayesian model averaging (BMA) are used to search for influential factors. The two approaches consistently select the same factors suggested by an economic theoretical model, including price …


Selecting Spatial Scale Of Area-Level Covariates In Regression Models, Lauren Grant Jan 2016

Selecting Spatial Scale Of Area-Level Covariates In Regression Models, Lauren Grant

Theses and Dissertations

Studies have found that the level of association between an area-level covariate and an outcome can vary depending on the spatial scale (SS) of a particular covariate. However, covariates used in regression models are customarily modeled at the same spatial unit. In this dissertation, we developed four SS model selection algorithms that select the best spatial scale for each area-level covariate. The SS forward stepwise, SS incremental forward stagewise, SS least angle regression (LARS), and SS lasso algorithms allow for the selection of different area-level covariates at different spatial scales, while constraining each covariate to enter at most one spatial …


Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang Nov 2015

Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang

Doctoral Dissertations

Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we …


Seasonal Decomposition For Geographical Time Series Using Nonparametric Regression, Hyukjun Gweon Apr 2013

Seasonal Decomposition For Geographical Time Series Using Nonparametric Regression, Hyukjun Gweon

Electronic Thesis and Dissertation Repository

A time series often contains various systematic effects such as trends and seasonality. These different components can be determined and separated by decomposition methods. In this thesis, we discuss time series decomposition process using nonparametric regression. A method based on both loess and harmonic regression is suggested and an optimal model selection method is discussed. We then compare the process with seasonal-trend decomposition by loess STL (Cleveland, 1979). While STL works well when that proper parameters are used, the method we introduce is also competitive: it makes parameter choice more automatic and less complex. The decomposition process often requires that …


Model Selection With Information Criteria, Changjiang Xu Oct 2010

Model Selection With Information Criteria, Changjiang Xu

Electronic Thesis and Dissertation Repository

This thesis is on model selection using information criteria. The information criteria include generalized information criterion and a family of Bayesian information criteria. The properties and improvement of the information criteria are investigated.

We analyze nonasymptotic and asymptotic properties of the information criteria for linear models, probabilistic models, and high dimensional models, respectively. We give probability of selecting a model and compute the probability by Monte Carlo methods. We derive the conditions under which the criteria are consistent, underfitting, or overfitting.

We further propose new model selection procedures to improve the information criteria. The procedures combine the information criteria with …


Selecting The Best Linear Mixed Model Using Predictive Approaches, Jun Wang Jan 2007

Selecting The Best Linear Mixed Model Using Predictive Approaches, Jun Wang

Theses and Dissertations

The linear mixed model is widely implemented in the analysis of longitudinal data. Inference techniques and information criteria are available and well-studied for goodness-of-fit within the linear mixed model setting. Predictive approaches such as R-squared, PRESS, and CCC are available for the linear mixed model but require more research (Edward, 2005). This project used simulation to investigate the performance of R-squared, PRESS, CCC, Pseudo F-test and information criterion for goodness-of-fit within the linear mixed model framework. Marginal and conditional approaches for these predictive statistics were studied under different variance-covariance structures. For compound symmetry structure, the success rates for all 17 …


A Logistic Regression Analysis Of Utah Colleges Exit Poll Response Rates Using Sas Software, Clint W. Stevenson Oct 2006

A Logistic Regression Analysis Of Utah Colleges Exit Poll Response Rates Using Sas Software, Clint W. Stevenson

Theses and Dissertations

In this study I examine voter response at an interview level using a dataset of 7562 voter contacts (including responses and nonresponses) in the 2004 Utah Colleges Exit Poll. In 2004, 4908 of the 7562 voters approached responded to the exit poll for an overall response rate of 65 percent. Logistic regression is used to estimate factors that contribute to a success or failure of each interview attempt. This logistic regression model uses interviewer characteristics, voter characteristics (both respondents and nonrespondents), and exogenous factors as independent variables. Voter characteristics such as race, gender, and age are strongly associated with response. …


On The Model Selection In A Frailty Setting, Jill F. Lundell May 1998

On The Model Selection In A Frailty Setting, Jill F. Lundell

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

When analyzing data in a survival setting, whether of people or objects, one of the assumptions made is that the population is homogeneous. This is not true in reality and certain adjustments can be made in the model to account for heterogeneity. Frailty is one method of dealing with some of this heterogeneity. It is not possible to measure frailty directly and hence it can be very difficult to determine which frailty model is appropriate for the data in interest. This thesis investigates three model selection methods in their effectiveness at determining which frailty distribution best describes a given set …