Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2004

Statistics and Probability

U.C. Berkeley Division of Biostatistics Working Paper Series

Prediction

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Deletion/Substitution/Addition Algorithm For Partitioning The Covariate Space In Prediction, Annette Molinaro, Mark J. Van Der Laan Nov 2004

Deletion/Substitution/Addition Algorithm For Partitioning The Covariate Space In Prediction, Annette Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a new method for predicting censored (and non-censored) clinical outcomes from a highly-complex covariate space. Previously we suggested a unified strategy for predictor construction, selection, and performance assessment. Here we introduce a new algorithm which generates a piecewise constant estimation sieve of candidate predictors based on an intensive and comprehensive search over the entire covariate space. This algorithm allows us to elucidate interactions and correlation patterns in addition to main effects.


Multiple Testing And Data Adaptive Regression: An Application To Hiv-1 Sequence Data, Merrill D. Birkner, Sandra E. Sinisi, Mark J. Van Der Laan Oct 2004

Multiple Testing And Data Adaptive Regression: An Application To Hiv-1 Sequence Data, Merrill D. Birkner, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Analysis of viral strand sequence data and viral replication capacity could potentially lead to biological insights regarding the replication ability of HIV-1. Determining specific target codons on the viral strand will facilitate the manufacturing of target specific antiretrovirals. Various algorithmic and analysis techniques can be applied to this application. We propose using multiple testing to find codons which have significant univariate associations with replication capacity of the virus. We also propose using a data adaptive multiple regression algorithm to obtain multiple predictions of viral replication capacity based on an entire mutant/non-mutant sequence profile. The data set to which these techniques …


Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan Mar 2004

Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In van der Laan and Dudoit (2003) we propose and theoretically study a unified loss function based statistical methodology, which provides a road map for estimation and performance assessment. Given a parameter of interest which can be described as the minimizer of the population mean of a loss function, the road map involves as important ingredients cross-validation for estimator selection and minimizing over subsets of basis functions the empirical risk of the subset-specific estimator of the parameter of interest, where the basis functions correspond to a parameterization of a specified subspace of the complete parameter space. In this article we …


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …