Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Statistical Methodology

Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh Jun 2004

Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

Due to the advent of high-throughput genomic technology, it has become possible to globally monitor cellular activities on a genomewide basis. With these new methods, scientists can begin to address important biological questions. One such question involves the identification of replication origins, which are regions in chromosomes where DNA replication is initiated. In addition, one hypothesis regarding replication origins is that their locations are non-random throughout the genome. In this article, we develop methods for identification of and cluster inference regarding replication origins involving genomewide expression data. We compare several nonparametric regression methods for the identification of replication origin locations. …


Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng Dec 2003

Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng

U.C. Berkeley Division of Biostatistics Working Paper Series

Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable …


Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan Sep 2003

Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) Define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator …


Asymptotic Optimality Of Likelihood Based Cross-Validation, Mark J. Van Der Laan, Sandrine Dudoit, Sunduz Keles Feb 2003

Asymptotic Optimality Of Likelihood Based Cross-Validation, Mark J. Van Der Laan, Sandrine Dudoit, Sunduz Keles

U.C. Berkeley Division of Biostatistics Working Paper Series

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish asymptotic optimality of a general class of likelihood based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation), in the sense that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true …


Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan Feb 2003

Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …