Open Access. Powered by Scholars. Published by Universities.®

Computational Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

726 Full-Text Articles 1,770 Authors 174,775 Downloads 99 Institutions

All Articles in Computational Biology

Faceted Search

726 full-text articles. Page 29 of 30.

Improved Ibd Detection Using Incomplete Haplotype Information, Giulio Genovese, Gregory Leibon, Martin R. Pollak, Daniel N. Rockmore 2010 Dartmouth College

Improved Ibd Detection Using Incomplete Haplotype Information, Giulio Genovese, Gregory Leibon, Martin R. Pollak, Daniel N. Rockmore

Dartmouth Scholarship

The availability of high density genetic maps and genotyping platforms has transformed human genetic studies. The use of these platforms has enabled population-based genome-wide association studies. However, in inheritance-based studies, current methods do not take full advantage of the information present in such genotyping analyses. In this paper we describe an improved method for identifying genetic regions shared identical-by-descent (IBD) from recent common ancestors. This method improves existing methods by taking advantage of phase information even if it is less than fully accurate or missing. We present an analysis of how using phase information increases the accuracy of IBD detection …


The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel 2010 Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, Department of Mathematics and Statistics

The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel

COBRA Preprint Series

A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with …


Constraint-Based Model Of Shewanella Oneidensis Mr-1 Metabolism: A Tool For Data Analysis And Hypothesis Generation, Grigoriy E. Pinchuk, Eric A. Hill, Oleg V. Geydebrekht, Jessica De Ingeniis, Xiaolin Zhang, Andrei Osterman, James H. Scott 2010 Pacific Northwest National Laboratory

Constraint-Based Model Of Shewanella Oneidensis Mr-1 Metabolism: A Tool For Data Analysis And Hypothesis Generation, Grigoriy E. Pinchuk, Eric A. Hill, Oleg V. Geydebrekht, Jessica De Ingeniis, Xiaolin Zhang, Andrei Osterman, James H. Scott

Dartmouth Scholarship

Shewanellae are gram-negative facultatively anaerobic metal-reducing bacteria commonly found in chemically (i.e., redox) stratified environments. Occupying such niches requires the ability to rapidly acclimate to changes in electron donor/acceptor type and availability; hence, the ability to compete and thrive in such environments must ultimately be reflected in the organization and utilization of electron transfer networks, as well as central and peripheral carbon metabolism. To understand how Shewanella oneidensis MR-1 utilizes its resources, the metabolic network was reconstructed. The resulting network consists of 774 reactions, 783 genes, and 634 unique metabolites and contains biosynthesis pathways for all cell constituents. Using constraint-based …


Probabilistic Protein Design, Comparative Modeling, And The Structure Of A Multidomain P53 Oligomer Bound To Dna, Thomas John Petty II 2010 University of Pennsylvania

Probabilistic Protein Design, Comparative Modeling, And The Structure Of A Multidomain P53 Oligomer Bound To Dna, Thomas John Petty Ii

Publicly Accessible Penn Dissertations

Proteins are the main functional components of all cellular processes, and most of them fold into unique three-dimensional shapes guided by their amino-acid sequence. Discovering the structure of a protein, or protein complexes, can provide important clues about how they perform their function. However, the chemical, physical or architectural properties of many proteins impede traditional approaches to structure determination. Two such proteins, the tumor suppressor p53 and the cholesterol processing enzyme endothelial lipase, are prime examples of problematic proteins that defy structural investigation via crystallographic methods. Therefore, new techniques must be developed to gain valuable structural insights, such as: computationally …


Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin 2010 The University of North Carolina at Chapel Hill

Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Error Correcting Codes And The Human Genome., Suzanne McLean Lyle 2010 East Tennessee State University

Error Correcting Codes And The Human Genome., Suzanne Mclean Lyle

Electronic Theses and Dissertations

In this work, we study error correcting codes and generalize the concepts with a view toward a novel application in the study of DNA sequences. The author investigates the possibility that an error correcting linear code could be included in the human genome through application and research. The author finds that while it is an accepted hypothesis that it is reasonable that some kind of error correcting code is used in DNA, no one has actually been able to identify one. The author uses the application to illustrate how the subject of coding theory can provide a teaching enrichment activity …


Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg 2010 Dartmouth College

Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg

Dartmouth Scholarship

To develop protein therapeutics from exogenous sources, it is necessary to mitigate the risks of eliciting an anti-biotherapeutic immune response. A key aspect of the response is the recognition and surface display by antigen-presenting cells of epitopes, short peptide fragments derived from the foreign protein. Thus, developing minimal-epitope variants represents a powerful approach to deimmunizing protein therapeutics. Critically, mutations selected to reduce immunogenicity must not interfere with the protein's therapeutic activity.


Why Genes Evolve Faster On Secondary Chromosomes In Bacteria, Vaughn S. Cooper, Samuel H. Vohr, Sarah C. Wrocklage, Philip J. Hatcher 2010 University of New Hampshire

Why Genes Evolve Faster On Secondary Chromosomes In Bacteria, Vaughn S. Cooper, Samuel H. Vohr, Sarah C. Wrocklage, Philip J. Hatcher

Molecular, Cellular and Biomedical Sciences Scholarship

In bacterial genomes composed of more than one chromosome, one replicon is typically larger, harbors more essential genes than the others, and is considered primary. The greater variability of secondary chromosomes among related taxa has led to the theory that they serve as an accessory genome for specific niches or conditions. By this rationale, purifying selection should be weaker on genes on secondary chromosomes because of their reduced necessity or usage. To test this hypothesis we selected bacterial genomes composed of multiple chromosomes from two genera, Burkholderia and Vibrio, and quantified the evolutionary rates (dN and dS) of all orthologs …


Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. van der Laan 2010 Division of Biostatistics, UC Berkeley

Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Many diseases and other important phenotypic outcomes are the result of a combination of factors. For example, expression levels of genes have been used as input to various statistical methods for predicting phenotypic outcomes. One particular popular variety is the so-called gene set enrichment analysis (GSEA). This paper discusses an augmentation to an existing strategy to estimate the significance of an associations between a disease outcome and a predetermined combination of biological factors, based on a specific data adaptive regression method (the "Super Learner," van der Laan et al., 2007). The procedure uses an aggressive search procedure, potentially resulting in …


Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry 2010 Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University

Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray pre-processing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy …


Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, Ralph S. Freedman 2010 UCLA School of Public Health

Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, Ralph S. Freedman

Harvard University Biostatistics Working Paper Series

No abstract provided.


Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof DeBeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry 2010 Ghent University, Belgium

Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

For a better understanding of the biology of an organism a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays can be used for this purpose. Such arrays allow the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. Much of the initial methodological efforts were designed for transcript discovery, while more recent developments also focus on differential expression. To our knowledge no methods for tiling arrays are described in the literature that can both assess transcript discovery and identify …


Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li 2010 Cornell

Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li

UPenn Biostatistics Working Papers

Graphs and networks are common ways of depicting information. In biology, many different processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This information provides useful supplement to the standard numerical genomic data such as microarray gene expression data. Effectively utilizing such an information can lead to a better identification of biologically relevant genomic features in the context of our prior biological knowledge. In this paper, we present a Bayesian variable selection procedure for network-structured covariates for both Gaussian linear and probit models. The key of our approach is the introduction of a Markov …


Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull 2010 American University

Wavelet-Based Functional Linear Mixed Models: An Application To Measurement Error–Corrected Distributed Lag Models, Elizabeth J. Malloy, Jeffrey S. Morris, Sara D. Adar, Helen Suh, Diane R. Gold, Brent A. Coull

Jeffrey S. Morris

Frequently, exposure data are measured over time on a grid of discrete values that collectively define a functional observation. In many applications, researchers are interested in using these measurements as covariates to predict a scalar response in a regression setting, with interest focusing on the most biologically relevant time window of exposure. One example is in panel studies of the health effects of particulate matter (PM), where particle levels are measured over time. In such studies, there are many more values of the functional data than observations in the data set so that regularization of the corresponding functional regression coefficient …


Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris 2010 The University of Texas M.D. Anderson Cancer Center

Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris

Jeffrey S. Morris

A recent article published in The Annals of Applied Statistics (AOAS) by two MD Anderson researchers—Keith Baggerly and Kevin Coombes—dissects results from a highly-influential series of medical papers involving genomics-driven personalized cancer therapy, and outlines a series of simple yet fatal flaws that raises serious questions about the veracity of the original results. Having immediate and strong impact, this paper, along with related work, is providing the impetus for new standards of reproducibility in scientific research.


Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes 2010 The University of Texas M.D. Anderson Cancer Center

Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes

Jeffrey S. Morris

Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the …


Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang 2010 Imperial College London

Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang

Jeffrey S. Morris

Whilst recent progress in ‘shotgun’ peptide separation by integrated liquid chromatography and mass spectrometry (LC/MS) has enabled its use as a sensitive analytical technique, proteome coverage and reproducibility is still limited and obtaining enough replicate runs for biomarker discovery is a challenge. For these reasons, recent research demonstrates the continuing need for protein separation by two-dimensional gel electrophoresis (2-DE). However, with traditional 2-DE informatics, the digitized images are reduced to symbolic data though spot detection and quantification before proteins are compared for differential expression by spot matching. Recently, a more robust and automated paradigm has emerged where gels are directly …


Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris 2010 Texas A&M University

Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris

Jeffrey S. Morris

Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. …


Quantitative Comparison Of Genomic-Wide Protein Domain Distributions, Arli A. Parikesit, Peter F. Stadler, Sonja J. Prohaska 2010 University of Indonesia

Quantitative Comparison Of Genomic-Wide Protein Domain Distributions, Arli A. Parikesit, Peter F. Stadler, Sonja J. Prohaska

Arli A Parikesit

Investigations into the origins and evolution of regulatory mechanisms require quantitative estimates of the abundance and co-occurrence of functional protein domains among distantly related genomes. Currently available databases, such as the SUPERFAMILY, are not designed for quantitative comparisons since they are built upon transcript and protein annotations provided by the various different genome annotation projects. Large biases are introduced by the differences in genome annotation protocols, which strongly depend on the availability of transcript information and well-annotated closely related organisms. Here we show that the combination of de novo gene predictors and subsequent HMM-based annotation of SCOP domains in the …


Importance Sampling Of Word Patterns In Dna And Protein Sequences, Hock Peng Chan, Nancy R. Zhang, Louis H. Y. Chen 2010 National University of Singapore

Importance Sampling Of Word Patterns In Dna And Protein Sequences, Hock Peng Chan, Nancy R. Zhang, Louis H. Y. Chen

Statistics Papers

The use of Monte Carlo evaluation to compute p-values of pattern counting test statistics is especially attractive when an asymptotic theory is absent or when the search sequence or the word pattern is too short for an asymptotic formula to be accurate. The drawback of applying Monte Carlo simulations directly is its inefficiency when p-values are small, which precisely is the situation of importance. In this paper, we provide a general importance sampling algorithm for efficient Monte Carlo evaluation of small p-values of pattern counting test statistics and apply it on word patterns of biological interest, in particular palindromes and …


Digital Commons powered by bepress