Open Access. Powered by Scholars. Published by Universities.®

Laboratory and Basic Science Research Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Laboratory and Basic Science Research

Supervised Detection Of Conserved Motifs In Dna Sequences With Cosmo, Oliver Bembom, Sunduz Keles, Mark J. Van Der Laan Jul 2006

Supervised Detection Of Conserved Motifs In Dna Sequences With Cosmo, Oliver Bembom, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A number of computational methods have been proposed for identifying transcription factor binding sites from a set of unaligned sequences that are thought to share the motif in question. We here introduce an algorithm, called cosmo, that allows this search to be supervised by specifying a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may be formulated, for example, on the basis of prior knowledge about the structure of the transcription factor in question. The algorithm is based on the same two-component multinomial mixture model used by MEME, with stronger reliance, however, …


Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan Mar 2006

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …