Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Statistical Models

A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani Oct 2003

A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

In classification problems arising in genomics research it is common to study populations for which a broad class assignment is known (say, normal versus diseased) and one seeks to find undiscovered subclasses within one or both of the known classes. Formally, this problem can be thought of as an unsupervised analysis nested within a supervised one. Here we take the view that the nested unsupervised analysis can successfully utilize information from the entire data set for constructing and/or selecting useful predictors. Specifically, we propose a mixture model approach to the nested unsupervised problem, where the supervised information is used to …


Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen May 2003

Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen

U.C. Berkeley Division of Biostatistics Working Paper Series

Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from …