Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

Southern Illinois University Carbondale

Outliers

Articles 1 - 6 of 6

Full-Text Articles in Entire DC Network

Variable Selection For 1d Regression Models, David J. Olive, Douglas M. Hawkins Feb 2005

Variable Selection For 1d Regression Models, David J. Olive, Douglas M. Hawkins

Articles and Preprints

Variable selection, the search for j relevant predictor variables from a group of p candidates, is a standard problem in regression analysis. The class of 1D regression models is a broad class that includes generalized linear models. We show that existing variable selection algorithms, originally meant for multiple linear regression and based on ordinary least squares and Mallows’ Cp, can also be used for 1D models. Graphical aids for variable selection are also provided.


Robust Regression With High Coverage, David J. Olive, Douglas M. Hawkins Jul 2003

Robust Regression With High Coverage, David J. Olive, Douglas M. Hawkins

Articles and Preprints

An important parameter for several high breakdown regression algorithm estimators is the number of cases given weight one, called the coverage of the estimator. Increasing the coverage is believed to result in a more stable estimator, but the price paid for this stability is greatly decreased resistance to outliers. A simple modification of the algorithm can greatly increase the coverage and hence its statistical performance while maintaining high outlier resistance.


Inconsistency Of Resampling Algorithms For High Breakdown Regression Estimators And A New Algorithm, Douglas M. Hawkins, David J. Olive Mar 2002

Inconsistency Of Resampling Algorithms For High Breakdown Regression Estimators And A New Algorithm, Douglas M. Hawkins, David J. Olive

Articles and Preprints

Since high breakdown estimators are impractical to compute exactly in large samples, approximate algorithms are used. The algorithm generally produces an estimator with a lower consistency rate and breakdown value than the exact theoretical estimator. This discrepancy grows with the sample size, with the implication that huge computations are needed for good approximations in large high-dimensioned samples

The workhorse for HBE has been the ‘elemental set’, or ‘basic resampling’ algorithm. This turns out to be completely ineffective in high dimensions with high levels of contamination. However, enriching it with a “concentration” step turns it into a method that is able …


High Breakdown Analogs Of The Trimmed Mean, David J. Olive Jan 2001

High Breakdown Analogs Of The Trimmed Mean, David J. Olive

Articles and Preprints

Two high breakdown estimators that are asymptotically equivalent to a sequence of trimmed means are introduced. They are easy to compute and their asymptotic variance is easier to estimate than the asymptotic variance of standard high breakdown estimators.


Applications And Algorithms For Least Trimmed Sum Of Absolute Deviations Regression, Douglas M. Hawkins, David Olive Dec 1999

Applications And Algorithms For Least Trimmed Sum Of Absolute Deviations Regression, Douglas M. Hawkins, David Olive

Articles and Preprints

High breakdown estimation (HBE) addresses the problem of getting reliable parameter estimates in the face of outliers that may be numerous and badly placed. In multiple regression, the standard HBE's have been those defined by the least median of squares (LMS) and the least trimmed squares (LTS) criteria. Both criteria lead to a partitioning of the data set's n cases into two “halves” – the covered “half” of cases are accommodated by the fit, while the uncovered “half”, which is intended to include any outliers, are ignored. In LMS, the criterion is the Chebyshev norm of the residuals of the …


Improved Feasible Solution Algorithms For High Breakdown Estimation, Douglas M. Hawkins, David J. Olive Mar 1999

Improved Feasible Solution Algorithms For High Breakdown Estimation, Douglas M. Hawkins, David J. Olive

Articles and Preprints

High breakdown estimation allows one to get reasonable estimates of the parameters from a sample of data even if that sample is contaminated by large numbers of awkwardly placed outliers. Two particular application areas in which this is of interest are multiple linear regression, and estimation of the location vector and scatter matrix of multivariate data. Standard high breakdown criteria for the regression problem are the least median of squares (LMS) and least trimmed squares (LTS); those for the multivariate location/scatter problem are the minimum volume ellipsoid (MVE) and minimum covariance determinant (MCD). All of these present daunting computational problems. …