Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Genetics and Genomics

The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan Jun 2004

The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan

The University of Michigan Department of Biostatistics Working Paper Series

In many scientific and medical settings, large-scale experiments are generating large quantities of data that lead to inferential problems involving multiple hypotheses. This has led to recent tremendous interest in statistical methods regarding the false discovery rate (FDR). Several authors have studied the properties involving FDR in a univariate mixture model setting. In this article, we turn the problem on its side; in this manuscript, we show that FDR is a by-product of Bayesian analysis of variable selection problem for a hierarchical linear regression model. This equivalence gives many Bayesian insights as to why FDR is a natural quantity to …


Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng Dec 2003

Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng

U.C. Berkeley Division of Biostatistics Working Paper Series

Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable …


Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan Sep 2003

Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) Define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator …


Identification Of Regulatory Elements Using A Feature Selection Method, Sunduz Keles, Mark J. Van Der Laan, Michael B. Eisen Sep 2001

Identification Of Regulatory Elements Using A Feature Selection Method, Sunduz Keles, Mark J. Van Der Laan, Michael B. Eisen

U.C. Berkeley Division of Biostatistics Working Paper Series

Many methods have been described to identify regulatory motifs in the transcription control regions of genes that exhibit similar patterns of gene expression across a variety of experimental conditions. Here we focus on a single experimental condition, and utilize gene expression data to identify sequence motifs associated with genes that are activated under this experimental condition. We use a linear model with two way interactions to model gene expression as a function of sequence features (words) present in presumptive transcription control regions. The most relevant features are selected by a feature selection method called stepwise selection with monte carlo cross …