Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Statistical Models

Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Advances In Measurement Error Modeling, Linh Nghiem May 2019

Advances In Measurement Error Modeling, Linh Nghiem

Statistical Science Theses and Dissertations

Measurement error in observations is widely known to cause bias and a loss of power when fitting statistical models, particularly when studying distribution shape or the relationship between an outcome and a variable of interest. Most existing correction methods in the literature require strong assumptions about the distribution of the measurement error, or rely on ancillary data which is not always available. This limits the applicability of these methods in many situations. Furthermore, new correction approaches are also needed for high-dimensional settings, where the presence of measurement error in the covariates adds another level of complexity to the desirable structure …


Estimation And Variable Selection In High-Dimensional Settings With Mismeasured Observations, Michael Byrd Jan 2019

Estimation And Variable Selection In High-Dimensional Settings With Mismeasured Observations, Michael Byrd

Statistical Science Theses and Dissertations

Understanding high-dimensional data has become essential for practitioners across many disciplines. The general increase in ability to collect large amounts of data has prompted statistical methods to adapt for the rising number of possible relationships to be uncovered. The key to this adaptation has been the notion of sparse models, or, rather, models where most relationships between variables are assumed to be negligible at best. Driving these sparse models have been constraints on the solution set, yielding regularization penalties imposed on the optimization procedure. While these penalties have found great success, they are typically formulated with strong assumptions on the …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Statistical Designs For Network A/B Testing, Victoria V. Pokhilko Jan 2019

Statistical Designs For Network A/B Testing, Victoria V. Pokhilko

Theses and Dissertations

A/B testing refers to the statistical procedure of experimental design and analysis to compare two treatments, A and B, applied to different testing subjects. It is widely used by technology companies such as Facebook, LinkedIn, and Netflix, to compare different algorithms, web-designs, and other online products and services. The subjects participating in these online A/B testing experiments are users who are connected in different scales of social networks. Two connected subjects are similar in terms of their social behaviors, education and financial background, and other demographic aspects. Hence, it is only natural to assume that their reactions to online products …


Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman Jan 2019

Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman

Graduate Theses, Dissertations, and Problem Reports

Quantifying human biological age is an important and difficult challenge. Different biomarkers and numerous approaches have been studied for biological age prediction, each with its advantages and limitations. In this work, we first introduce a new anthropometric measure (called Surface-based Body Shape Index, SBSI) that accounts for both body shape and body size, and evaluate its performance as a predictor of all-cause mortality. We analyzed data from the National Health and Human Nutrition Examination Survey (NHANES). Based on the analysis, we introduce a new body shape index constructed from four important anthropometric determinants of body shape and body size: body …


The Use Of 3-D Highway Differential Geometry In Crash Prediction Modeling, Kiriakos Amiridis Jan 2019

The Use Of 3-D Highway Differential Geometry In Crash Prediction Modeling, Kiriakos Amiridis

Theses and Dissertations--Civil Engineering

The objective of this research is to evaluate and introduce a new methodology regarding rural highway safety. Current practices rely on crash prediction models that utilize specific explanatory variables, whereas the depository of knowledge for past research is the Highway Safety Manual (HSM). Most of the prediction models in the HSM identify the effect of individual geometric elements on crash occurrence and consider their combination in a multiplicative manner, where each effect is multiplied with others to determine their combined influence. The concepts of 3-dimesnional (3-D) representation of the roadway surface have also been explored in the past aiming to …


Hydroclimate Drivers And Atmospheric Dynamics Of Floods, Nasser Najibi Jan 2019

Hydroclimate Drivers And Atmospheric Dynamics Of Floods, Nasser Najibi

Dissertations and Theses

Our preliminary survey showed that most of the recent flood-related studies did not formally explain the physical mechanisms of long-duration and large-peak flood events that can evoke substantial damages to properties and infrastructure systems. These studies also fell short of fully assessing the interactions of coupled ocean-atmosphere and land dynamics which are capable of forcing substantial changes to the flood attributes by governing the exceeding surface flow regimes and moisture source-sink relationships at the spatiotemporal scales important for risk management. This dissertation advances the understanding of the variability in flood duration, peak, volume, and timing at the regional to the …


Bayesian Analysis For The Intraclass Model And For The Quantile Semiparametric Mixed-Effects Double Regression Models, Duo Zhang Jan 2019

Bayesian Analysis For The Intraclass Model And For The Quantile Semiparametric Mixed-Effects Double Regression Models, Duo Zhang

Dissertations, Master's Theses and Master's Reports

This dissertation consists of three distinct but related research projects. The first two projects focus on objective Bayesian hypothesis testing and estimation for the intraclass correlation coefficient in linear models. The third project deals with Bayesian quantile inference for the semiparametric mixed-effects double regression models. In the first project, we derive the Bayes factors based on the divergence-based priors for testing the intraclass correlation coefficient (ICC). The hypothesis testing of the ICC is used to test the uncorrelatedness in multilevel modeling, and it has not well been studied from an objective Bayesian perspective. Simulation results show that the two sorts …


Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng Jan 2019

Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng

Theses and Dissertations--Statistics

The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry.

In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum …


Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos Jan 2019

Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos

Theses and Dissertations--Statistics

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Statistical Modeling Of Influenza-Like-Illness In Montana Using Spatial And Temporal Methods, Benjamin A. Stark Jan 2019

Statistical Modeling Of Influenza-Like-Illness In Montana Using Spatial And Temporal Methods, Benjamin A. Stark

Graduate Student Theses, Dissertations, & Professional Papers

Studying air pollution and public health has been a historically important question in science. It has long been hypothesized that severe air pollution conditions lead to negative implications in basic human health. Primarily, areas thats are prone to severe degrees of human pollution are the focus of such studies. Such research relating to less populated areas are scarce, and this scarcity raises the question of how such pollution dynamics (human-made and natural) influence human health in more rural areas.

The aim of this study is to explore this hole in research; in particular we explore possible links between air pollution …