Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Statistics and Probability

Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


An Exploration Of Link Functions Used In Ordinal Regression, Thomas J. Smith, David A. Walker, Cornelius M. Mckenna Apr 2020

An Exploration Of Link Functions Used In Ordinal Regression, Thomas J. Smith, David A. Walker, Cornelius M. Mckenna

Journal of Modern Applied Statistical Methods

The purpose of this study is to examine issues involved with choice of a link function in generalized linear models with ordinal outcomes, including distributional appropriateness, link specificity, and palindromic invariance are discussed and an exemplar analysis provided using the Pew Research Center 25th anniversary of the Web Omnibus Survey data. Simulated data are used to compare the relative palindromic invariance of four distinct indices of determination/discrimination, including a newly proposed index by Smith et al. (2017).


Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu Oct 2019

Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu

Doctoral Dissertations

We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …


Logistic Regression: An Inferential Method For Identifying The Best Predictors, Rand Wilcox Mar 2019

Logistic Regression: An Inferential Method For Identifying The Best Predictors, Rand Wilcox

Journal of Modern Applied Statistical Methods

When dealing with a logistic regression model, there is a simple method for estimating the strength of the association between the jth covariate and the dependent variable when all covariates are entered into the model. There is the issue of determining whether the jth independent variable has a stronger or weaker association than the kth independent variable. This note describes a method for dealing with this issue that was found to perform reasonably well in simulations.


Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek Aug 2017

Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek

Electronic Theses and Dissertations

ABSTRACT

Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models

By

Gregory Frank Malek

Stephen F. Austin State University, Masters in Statistics Program,

Nacogdoches, Texas, U.S.A.

g_m_2002@live.com

This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …


Effective Estimation Strategy Of Finite Population Variance Using Multi-Auxiliary Variables In Double Sampling, Reba Maji, G. N. Singh, Arnab Bandyopadhyay May 2017

Effective Estimation Strategy Of Finite Population Variance Using Multi-Auxiliary Variables In Double Sampling, Reba Maji, G. N. Singh, Arnab Bandyopadhyay

Journal of Modern Applied Statistical Methods

Estimation of population variance in two-phase (double) sampling is considered using information on multiple auxiliary variables. An unbiased estimator is proposed and its properties are studied under two different structures. The superiority of the suggested estimator over some contemporary estimators of population variance was established through empirical studies from a natural and an artificially generated dataset.


Efficient And Unbiased Estimation Procedure Of Population Mean In Two-Phase Sampling, Reba Maji, Arnab Bandyopadhyay, G. N. Singh Nov 2016

Efficient And Unbiased Estimation Procedure Of Population Mean In Two-Phase Sampling, Reba Maji, Arnab Bandyopadhyay, G. N. Singh

Journal of Modern Applied Statistical Methods

In this paper, an unbiased regression-ratio type estimator has been developed for estimating the population mean using two auxiliary variables in double sampling. Its properties are studied under two different cases. Empirical studies and graphical simulation have been done to demonstrate the efficiency of the proposed estimator over other estimators.


A Spatial Analytical Framework For Examining Road Traffic Crashes, Grace O. Korter May 2016

A Spatial Analytical Framework For Examining Road Traffic Crashes, Grace O. Korter

Journal of Modern Applied Statistical Methods

A number of different modeling techniques have been used to examine road traffic crashes for analytic and predictive purposes. Map-based spatial analysis is introduced. Applications are given which show the power in a combination of existing exploratory and statistical methods.


Contrails: Causal Inference Using Propensity Scores, Dean S. Barron Nov 2015

Contrails: Causal Inference Using Propensity Scores, Dean S. Barron

Journal of Modern Applied Statistical Methods

Contrails are clouds caused by airplane exhausts, which geologists contend decrease daily temperature ranges on Earth. Following the 2001 World Trade Center attack, cancelled domestic flights triggered the first absence of contrails in decades. Resultant exceptional data capacitated causal inference analysis by propensity score matching. Estimated contrail effect was 6.8981°F.


A New Diagnostic Test For Regression, Yun Shi Apr 2013

A New Diagnostic Test For Regression, Yun Shi

Electronic Thesis and Dissertation Repository

A new diagnostic test for regression and generalized linear models is discussed. The test is based on testing if the residuals are close together in the linear space of one of the covariates are correlated. This is a generalization of the famous problem of spurious correlation in time series regression. A full model building approach for the case of regression was developed in Mahdi (2011, Ph.D. Thesis, Western University, ”Diagnostic Checking, Time Series and Regression”) using an iterative generalized least squares algorithm. Simulation experiments were reported that demonstrate the validity and utility of this approach but no actual applications were …


Improved Estimator In The Presence Of Multicollinearity, Ghadban Khalaf May 2012

Improved Estimator In The Presence Of Multicollinearity, Ghadban Khalaf

Journal of Modern Applied Statistical Methods

The performances of two biased estimators for the general linear regression model under conditions of collinearity are examined and a new proposed ridge parameter is introduced. Using Mean Square Error (MSE) and Monte Carlo simulation, the resulting estimator’s performance is evaluated and compared with the Ordinary Least Square (OLS) estimator and the Hoerl and Kennard (1970a) estimator. Results of the simulation study indicate that, with respect to MSE criteria, in all cases investigated the proposed estimator outperforms both the OLS and the Hoerl and Kennard estimators.


Number Of Replications Required In Monte Carlo Simulation Studies: A Synthesis Of Four Studies, Daniel J. Mundform, Jay Schaffer, Myoung-Jin Kim, Dale Shaw, Ampai Thongteeraparp, Pornsin Supawan May 2011

Number Of Replications Required In Monte Carlo Simulation Studies: A Synthesis Of Four Studies, Daniel J. Mundform, Jay Schaffer, Myoung-Jin Kim, Dale Shaw, Ampai Thongteeraparp, Pornsin Supawan

Journal of Modern Applied Statistical Methods

Monte Carlo simulations are used extensively to study the performance of statistical tests and control charts. Researchers have used various numbers of replications, but rarely provide justification for their choice. Currently, no empirically-based recommendations regarding the required number of replications exist. Twenty-two studies were re-analyzed to determine empirically-based recommendations.


Least Squares Percentage Regression, Chris Tofallis Nov 2008

Least Squares Percentage Regression, Chris Tofallis

Journal of Modern Applied Statistical Methods

In prediction, the percentage error is often felt to be more meaningful than the absolute error. We therefore extend the method of least squares to deal with percentage errors, for both simple and multiple regression. Exact expressions are derived for the coefficients, and we show how such models can be estimated using standard software. When the relative error is normally distributed, least squares percentage regression is shown to provide maximum likelihood estimates. The multiplicative error model is linked to least squares percentage regression in the same way that the standard additive error model is linked to ordinary least squares regression.


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


Regression By Data Segments Via Discriminant Analysis, Stan Lipovetsky, Michael Conklin May 2005

Regression By Data Segments Via Discriminant Analysis, Stan Lipovetsky, Michael Conklin

Journal of Modern Applied Statistical Methods

It is known that two-group linear discriminant function can be constructed via binary regression. In this article, it is shown that the opposite relation is also relevant – it is possible to present multiple regression as a linear combination of a main part, based on the pooled variance, and Fisher discriminators by data segments. Presenting regression as an aggregate of the discriminators allows one to decompose coefficients of the model into sum of several vectors related to segments. Using this technique provides an understanding of how the total regression model is composed of the regressions by the segments with possible …


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …


Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan Feb 2003

Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …


Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins Sep 2002

Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.

Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …


Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell Sep 2002

Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …