Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Selected Works

2011

Discipline
Keyword
Publication
File Type

Articles 1 - 12 of 12

Full-Text Articles in Applied Statistics

Beyond Multiple Regression: Using Commonality Analysis To Better Understand R2 Results, Russell Warne Sep 2011

Beyond Multiple Regression: Using Commonality Analysis To Better Understand R2 Results, Russell Warne

Russell T Warne

Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated independent variables. Commonality analysis—heretofore rarely used in gifted education research—is a statistical method that partitions the explained variance of a dependent variable into nonoverlapping parts according to the independent variable(s) that are related to each portion. This Methodological Brief includes an example of commonality analysis and equations for researchers who wish to conduct their …


Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan Jul 2011

Asymptotic Theory For Cross-Validated Targeted Maximum Likelihood Estimation, Wenjing Zheng, Mark J. Van Der Laan

Wenjing Zheng

We consider a targeted maximum likelihood estimator of a path-wise differentiable parameter of the data generating distribution in a semi-parametric model based on observing n independent and identically distributed observations. The targeted maximum likelihood estimator (TMLE) uses V-fold sample splitting for the initial estimator in order to make the TMLE maximally robust in its bias reduction step. We prove a general theorem that states asymptotic efficiency (and thereby regularity) of the targeted maximum likelihood estimator when the initial estimator is consistent and a second order term converges to zero in probability at a rate faster than the square root of …


Rejoinder: Estimation Issues For Copulas Applied To Marketing Data, Peter Danaher, Michael Smith Dec 2010

Rejoinder: Estimation Issues For Copulas Applied To Marketing Data, Peter Danaher, Michael Smith

Michael Stanley Smith

Estimating copula models using Bayesian methods presents some subtle challenges, ranging from specification of the prior to computational tractability. There is also some debate about what is the most appropriate copula to employ from those available. We address these issues here and conclude by discussing further applications of copula models in marketing.


Forecasting Television Ratings, Peter Danaher, Tracey Dagger, Michael Smith Dec 2010

Forecasting Television Ratings, Peter Danaher, Tracey Dagger, Michael Smith

Michael Stanley Smith

Despite the state of flux in media today, television remains the dominant player globally for advertising spend. Since television advertising time is purchased on the basis of projected future ratings, and ad costs have skyrocketed, there is increasing pressure to forecast television ratings accurately. Previous forecasting methods are not generally very reliable and many have not been validated, but more distressingly, none have been tested in today’s multichannel environment. In this study we compare 8 different forecasting models, ranging from a naïve empirical method to a state-of-the-art Bayesian model-averaging method. Our data come from a recent time period, 2004-2008 in …


Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb Dec 2010

Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb

Douglas G. Steigerwald

We study the finite-sample performance of test statistics in linear regression models where the error dependence is of unknown form. With an unknown dependence structure there is traditionally a trade-off between the maximum lag over which the correlation is estimated (the bandwidth) and the amount of heterogeneity in the process. When allowing for heterogeneity, through conditional heteroskedasticity, the correlation at far lags is generally omitted and the resultant inflation of the empirical size of test statistics has long been recognized. To allow for correlation at far lags we study test statistics constructed under the possibly misspecified assumption of conditional homoskedasticity. …


The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna Dec 2010

The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna

Douglas G. Steigerwald

Fake antivirus (AV) programs have been utilized to defraud millions of computer users into paying as much as one hundred dollars for a phony software license. As a result, fake AV software has evolved into one of the most lucrative criminal operations on the Internet. In this paper, we examine the operations of three large-scale fake AV businesses, lasting from three months to more than two years. More precisely, we present the results of our analysis on a trove of data obtained from several backend servers that the cybercriminals used to drive their scam operations. Our investigations reveal that these …


An Autoregressive Approach To House Price Modeling, Chaitra Nagaraja, Lawrence Brown, Linda Zhao Dec 2010

An Autoregressive Approach To House Price Modeling, Chaitra Nagaraja, Lawrence Brown, Linda Zhao

Chaitra H Nagaraja

No abstract provided.


Windows Executable For Gaussian Copula With Nbd Margins, Michael S. Smith Dec 2010

Windows Executable For Gaussian Copula With Nbd Margins, Michael S. Smith

Michael Stanley Smith

This is an example Windows 32bit program to estimate a Gaussian copula model with NBD margins. The margins are estimated first using MLE, and the copula second using Bayesian MCMC. The model was discussed in Danaher & Smith (2011; Marketing Science) as example 4 (section 4.2).


Modeling Multivariate Distributions Using Copulas: Applications In Marketing, Peter J. Danaher, Michael S. Smith Dec 2010

Modeling Multivariate Distributions Using Copulas: Applications In Marketing, Peter J. Danaher, Michael S. Smith

Michael Stanley Smith

In this research we introduce a new class of multivariate probability models to the marketing literature. Known as “copula models”, they have a number of attractive features. First, they permit the combination of any univariate marginal distributions that need not come from the same distributional family. Second, a particular class of copula models, called “elliptical copula”, have the property that they increase in complexity at a much slower rate than existing multivariate probability models as the number of dimensions increase. Third, they are very general, encompassing a number of existing multivariate models, and provide a framework for generating many more. …


Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann Dec 2010

Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann

Michael Stanley Smith

Cycling is attracting renewed attention as a mode of transport in western urban environments, yet the determinants of usage are poorly understood. In this paper we investigate some of these using intraday bicycle volumes collected via induction loops located at ten bike paths in the city of Melbourne, Australia, between December 2005 and June 2008. The data are hourly counts at each location, with temporal and spatial disaggregation allowing for the impact of meteorology to be measured accurately for the first time. Moreover, during this period petrol prices varied dramatically and the data also provide a unique opportunity to assess …


The Generalized Shrinkage Estimator For The Analysis Of Functional Connectivity Of Brain Signals, Mark Fiecas, Hernando Ombao Dec 2010

The Generalized Shrinkage Estimator For The Analysis Of Functional Connectivity Of Brain Signals, Mark Fiecas, Hernando Ombao

Mark Fiecas

We develop a new statistical method for estimating functional connectivity between neurophysiological signals represented by a multivariate time series. We use partial coherence as the measure of functional connectivity. Partial coherence identifies the frequency bands that drive the direct linear association between any pair of channels. To estimate partial coherence, one would first need an estimate of the spectral density matrix of the multivariate time series. Parametric estimators of the spectral density matrix provide good frequency resolution but could be sensitive when the parametric model is misspecified. Smoothing-based nonparametric estimators are robust to model misspecification and are consistent but may …


An Introduction To Propensity-Score Methods For Reducing Confounding In Observational Studies, Peter C. Austin Dec 2010

An Introduction To Propensity-Score Methods For Reducing Confounding In Observational Studies, Peter C. Austin

Peter Austin

The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (non-randomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. We describe four different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the …