Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Regression

COBRA

Publication Year

Articles 1 - 5 of 5

Full-Text Articles in Statistics and Probability

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …


Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan Feb 2003

Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …


Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins Sep 2002

Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.

Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …


Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell Sep 2002

Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …