Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Censored data (2)
- Comparative genomic hybridization (2)
- Cross-validation (2)
- Density estimation (2)
- Loss function (2)
-
- Model selection (2)
- Multivariate outcome (2)
- Prediction (2)
- Regression trees (2)
- Survival analysis (2)
- Auto-correlation (1)
- CART (1)
- Classification (1)
- Co-regulated genes (1)
- DNA sequence (1)
- Eigenvalue (1)
- Entropy (1)
- Estimation (1)
- Identity by descent (1)
- Infinitesimal generator (1)
- Information content (1)
- Microarray (1)
- Mixture model (1)
- Motif finding (1)
- Nonlinear constraint maximization (1)
- Position weight matrix (1)
- Quotient graph (1)
- Recombination fraction (1)
- Regulatory motif (1)
- Risk (1)
Articles 1 - 4 of 4
Full-Text Articles in Physical Sciences and Mathematics
Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng
Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng
U.C. Berkeley Division of Biostatistics Working Paper Series
Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable …
Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan
Tree-Based Multivariate Regression And Density Estimation With Right-Censored Data , Annette M. Molinaro, Sandrine Dudoit, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) Define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator …
Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen
Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen
U.C. Berkeley Division of Biostatistics Working Paper Series
Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from …
Ibd Configuration Transition Matrices And Linkage Score Tests For Unilineal Relative Pairs, Sandrine Dudoit
Ibd Configuration Transition Matrices And Linkage Score Tests For Unilineal Relative Pairs, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
Properties of transition matrices between IBD configurations are derived for four general classes of unilineal relative pairs obtained from the grand-parent/ grand-child, half-sib, avuncular, and cousin relationships. In this setting, IBD configurations are defined as orbits of groups acting on a set of inheritance vectors. Properties of the transition matrix between IBD configurations at two linked loci are derived by relating its infinitesimal generator to the adjacency matrix of a quotient graph. The second largest eigenvalue of the infinitesimal generator and its multiplicity are key in determining the form of the transition matrix and of likelihood-based linkage tests such as …