Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

2005

Computational Biology

Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 18 of 18

Full-Text Articles in Genetics and Genomics

Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden Dec 2005

Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden

UW Biostatistics Working Paper Series

The study of the cell-cycle is important in order to aid in our understanding of the basic mechanisms of life, yet progress has been slow due to the complexity of the process and our lack of ability to study it at high resolution. Recent advances in microarray technology have enabled scientists to study the gene expression at the genome-scale with a manageable cost, and there has been an increasing effort to identify cell-cycle regulated genes. In this chapter, we discuss the analysis of cell-cycle gene expression data, focusing on a model-based Bayesian approaches. The majority of the models we describe …


Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey Nov 2005

Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for …


A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey Nov 2005

A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

A two-channel microarray measures the relative expression levels of thousands of genes from a pair of biological samples. In order to reliably compare gene expression levels between and within arrays, it is necessary to remove systematic errors that distort the biological signal of interest. The standard for accomplishing this is smoothing "MA-plots" to remove intensity-dependent dye bias and array-specific effects. However, MA methods require strong assumptions. We review these assumptions and derive several practical scenarios in which they fail. The "dye-swap" normalization method has been much less frequently used because it requires two arrays per pair of samples. We show …


Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota Nov 2005

Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota

Dartmouth Scholarship

The responses to interleukin 1 (IL-1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD …


An Introduction To Low-Level Analysis Methods Of Dna Microarray Data, Wolfgang Huber, Anja Von Heydebreck, Martin Vingron Nov 2005

An Introduction To Low-Level Analysis Methods Of Dna Microarray Data, Wolfgang Huber, Anja Von Heydebreck, Martin Vingron

Bioconductor Project Working Papers

This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray …


Simultaneous And Exact Interval Estimates For The Contrast Of Two Groups Based On An Extremely High Dimensional Response Variable: Application To Mass Spec Data Analysis, Yuhyun Park, Sean R. Downing, Cheng Li Dr., William C. Hahn, Philip W. Kantoff, L. J. Wei Sep 2005

Simultaneous And Exact Interval Estimates For The Contrast Of Two Groups Based On An Extremely High Dimensional Response Variable: Application To Mass Spec Data Analysis, Yuhyun Park, Sean R. Downing, Cheng Li Dr., William C. Hahn, Philip W. Kantoff, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


The Optimal Discovery Procedure: A New Approach To Simultaneous Significance Testing, John D. Storey Sep 2005

The Optimal Discovery Procedure: A New Approach To Simultaneous Significance Testing, John D. Storey

UW Biostatistics Working Paper Series

Significance testing is one of the main objectives of statistics. The Neyman-Pearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. As …


The Optimal Discovery Procedure For Large-Scale Significance Testing, With Applications To Comparative Microarray Experiments, John D. Storey, James Y. Dai, Jeffrey T. Leek Sep 2005

The Optimal Discovery Procedure For Large-Scale Significance Testing, With Applications To Comparative Microarray Experiments, John D. Storey, James Y. Dai, Jeffrey T. Leek

UW Biostatistics Working Paper Series

As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the Optimal Discovery Procedure (ODP), which has recently been introduced and theoretically shown …


Metaheuristic Applications And Their Solutions Quality, Dr. Zahid Hussain Aug 2005

Metaheuristic Applications And Their Solutions Quality, Dr. Zahid Hussain

International Conference on Information and Communication Technologies

Over the past few decades, a wide variety of classes of combinatorial problems (e.g. the assignment problem, the knapsack problem, the vehicle routing problem, etc.) have emerged - from such areas as management science, telecommunication, AI, VLSI design and many others. Many large combinatorial problems are NP-hard problems because of the combinatorial growth of their solution search space with the problem size. Such problems are commonly solved by some version of a prominent metaheuristic (e.g. Genetic Algorithms, Tabu Search, Simulated Annealing and etc.). These heuristics seek good but approximate solutions at a reasonable computational cost. These heuristics are of stochastic …


Analysis Of Affymetrix Genechip Data Using Amplified Rna, Leslie Cope, Scott M. Hartman, Hinrich W.H. Gohlmann, Jay P. Tiesman, Rafael A. Irizarry Aug 2005

Analysis Of Affymetrix Genechip Data Using Amplified Rna, Leslie Cope, Scott M. Hartman, Hinrich W.H. Gohlmann, Jay P. Tiesman, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

The standard method of target synthesis for hybridization to Affymetrix GeneChip® expression microarrays requires a relatively large amount of input total RNA (1-15 micrograms). When small biological samples are collected by microdissection or other methods, amplification techniques are required to provide sufficient target for hybridization to expression arrays. One amplification technique used is to perform two successive rounds of T7-based in vitro transcription. However, the use of random primers required to re-generate cDNA from the first round transcription reaction results in shortened copies of the cDNA, and ultimately the cRNA, transcripts from which the 5' end is missing. In this …


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …


Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan Jan 2005

Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper, we provide an overview of existing partitioning and hierarchical clustering algorithms in R. We discuss statistical issues and methods in choosing the number of clusters, the choice of clustering algorithm, and the choice of dissimilarity matrix. In particular, we illustrate how the bootstrap can be employed as a statistical method in cluster analysis to establish the reproducibility of the clusters and the overall variability of the followed procedure. We also show how to visualize a clustering result by plotting ordered dissimilarity matrices in R. We present a new R package, hopach, which implements the hybrid clustering method, …


A Digital Atlas To Characterize The Mouse Brain Transcriptome, James P. Carson, Tao Ju, Hui-Chen Lu, Christina Thaller, Mei Xu, Sarah Pallas, Michael C. Crair, Joe Warren, Wah Chiu, Gregor Eichele Jan 2005

A Digital Atlas To Characterize The Mouse Brain Transcriptome, James P. Carson, Tao Ju, Hui-Chen Lu, Christina Thaller, Mei Xu, Sarah Pallas, Michael C. Crair, Joe Warren, Wah Chiu, Gregor Eichele

PCOM Scholarly Papers

Massive amounts of data are being generated in an effort to represent for the brain the expression of all genes at cellular resolution. Critical to exploiting this effort is the ability to place these data into a common frame of reference. Here we have developed a computational method for annotating gene expression patterns in the context of a digital atlas to facilitate custom user queries and comparisons of this type of data. This procedure has been applied to 200 genes in the postnatal mouse brain. As an illustration of utility, we identify candidate genes that may be related to Parkinson …


A Brief History Of Bioperl, Colin Crossman, Arti K. Rai Jan 2005

A Brief History Of Bioperl, Colin Crossman, Arti K. Rai

Faculty Scholarship

Large-scale open-source projects face a litany of pitfalls and difficulties. Problems of contribution quality, credit for contributions, project coordination, funding, and mission-creep are ever-present. Of these, long-term funding and project coordination can interact to form a particularly difficult problem for open-source projects in an academic environment.

BioPerl was chosen as an example of a successful academic open-source project. Several of the roadblocks and hurdles encountered and overcome in the development of BioPerl are examined through the telling of the history of the project. Along the way, key points of open-source law are explained, such as license choice and copyright.

The …


Bioinformatic Analysis Of The Metal-Binding Protein Families And Heavy Metal Resistance Amongst Cyanobacteria, Tin-Chun Chu, Lee Lee, Shankar Srinivasan Dec 2004

Bioinformatic Analysis Of The Metal-Binding Protein Families And Heavy Metal Resistance Amongst Cyanobacteria, Tin-Chun Chu, Lee Lee, Shankar Srinivasan

Tin-Chun Chu, Ph.D.

No abstract provided.


Genetic Variation And Migration In The Mexican Free-Tailed Bat (Tadarida Brasiliensis Mexicana), Amy L. Russell, R. A. Medellín, G. F. Mccracken Dec 2004

Genetic Variation And Migration In The Mexican Free-Tailed Bat (Tadarida Brasiliensis Mexicana), Amy L. Russell, R. A. Medellín, G. F. Mccracken

Amy L. Russell

Incomplete lineage sorting can genetically link populations long after they have diverged, and will exert a more powerful influence on larger populations. The effects of this stochastic process can easily be confounded with those of gene flow, potentially leading to inaccurate estimates of dispersal capabilities or erroneous designation of evolutionarily significant units (ESUs). We have used phylogenetic, population genetic, and coalescent methods to examine genetic structuring in large populations of a widely dispersing bat species and to test hypotheses concerning the influences of coalescent stochasticity vs. gene flow. The Mexican free-tailed bat, Tadarida brasiliensis mexicana, exhibits variation in both migra- …


Poor Taxon Sampling, Poor Character Sampling, And Non-Repeatable Analyses Of A Contrived Dataset Do Not Provide A More Credible Estimate Of Insect Phylogeny: A Reply To Kjer., T. Heath Ogden Dec 2004

Poor Taxon Sampling, Poor Character Sampling, And Non-Repeatable Analyses Of A Contrived Dataset Do Not Provide A More Credible Estimate Of Insect Phylogeny: A Reply To Kjer., T. Heath Ogden

T. Heath Ogden

The wealth of data available for phylogenetic analysis of the insect orders, from both morphological and molecular sources, is steadily increasing. However, controversy exists among the methodologies one can use to reconstruct ordinal relationships. Recently, Kjer (2004) presented an analysis of insect ordinal relationships based exclusively on a single source of information: 18S rDNA sequence data. Kjer claims that his analysis resulted in a more ‘‘credible’’ phylogeny for the insect orders and strongly criticized our previous phylogenetic results. However, Kjer only used a subset of the data that are currently available for insect ordinal phylogeny, misrepresented our analyses, and omitted …


Phylogeny Of Ephemeroptera (Mayflies) Based On Molecular Evidence, T. Heath Ogden Dec 2004

Phylogeny Of Ephemeroptera (Mayflies) Based On Molecular Evidence, T. Heath Ogden

T. Heath Ogden

This study represents the Wrst molecular phylogeny for the Order Ephemeroptera. The analyses included 31 of the 37 families, representing »24% of the genera. Fifteen families were supported as being monophyletic, Wve families were supported as nonmonophyletic, and 11 families were only represented by one species, and monophyly was not testable. The suborders Furcatergalia and Carapacea were supported as monophyletic while Setisura and Pisciforma were not supported as monophyletic. The superfamilies Ephemerelloidea and Caenoidea were supported as monophyletic while Baetoidea, Siphlonuroidea, Ephemeroidea, and Heptagenioidea were not. Baetidae was recovered as sister to the remaining clades. The mayXy gill to wing …