Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Statistics and Probability

The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan Jun 2004

The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan

The University of Michigan Department of Biostatistics Working Paper Series

In many scientific and medical settings, large-scale experiments are generating large quantities of data that lead to inferential problems involving multiple hypotheses. This has led to recent tremendous interest in statistical methods regarding the false discovery rate (FDR). Several authors have studied the properties involving FDR in a univariate mixture model setting. In this article, we turn the problem on its side; in this manuscript, we show that FDR is a by-product of Bayesian analysis of variable selection problem for a hierarchical linear regression model. This equivalence gives many Bayesian insights as to why FDR is a natural quantity to …


Multiple Testing Methods For Chip-Chip High Density Oligonucleotide Array Data, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Simon E. Cawley Jun 2004

Multiple Testing Methods For Chip-Chip High Density Oligonucleotide Array Data, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Simon E. Cawley

U.C. Berkeley Division of Biostatistics Working Paper Series

Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genome-wide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription factor-bound genomic DNA followed by high density oligonucleotide hybridization (Chip) of the IP-enriched DNA. We investigate the ChIP-Chip data structure and propose methods for inferring the location of transcription factor binding sites from these data. The proposed methods involve testing for each probe whether it is part of a bound sequence …


Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan Mar 2004

Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In van der Laan and Dudoit (2003) we propose and theoretically study a unified loss function based statistical methodology, which provides a road map for estimation and performance assessment. Given a parameter of interest which can be described as the minimizer of the population mean of a loss function, the road map involves as important ingredients cross-validation for estimator selection and minimizing over subsets of basis functions the empirical risk of the subset-specific estimator of the parameter of interest, where the basis functions correspond to a parameterization of a specified subspace of the complete parameter space. In this article we …


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …