Open Access. Powered by Scholars. Published by Universities.®

Computational Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

2006

PDF

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 28 of 28

Full-Text Articles in Computational Biology

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann Nov 2006

Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann

Johns Hopkins University, Dept. of Biostatistics Working Papers

We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Oct 2006

Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

Hongzhe Li

High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …


Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li Oct 2006

Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li

Hongzhe Li

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …


Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry Oct 2006

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng Aug 2006

Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng

Harvard University Biostatistics Working Paper Series

No abstract provided.


Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin Aug 2006

Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin Aug 2006

Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin Aug 2006

Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin Aug 2006

A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li Aug 2006

Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li

UPenn Biostatistics Working Papers

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …


Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman Aug 2006

Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman

Bioconductor Project Working Papers

Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper.

Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting …


Circadian Rhythmicity By Autocatalysis, Arun Mehra, Christian I. Hong, Mi Shi, Jennifer J. Loros, Jay C. Dunlap, Peter Ruoff Jul 2006

Circadian Rhythmicity By Autocatalysis, Arun Mehra, Christian I. Hong, Mi Shi, Jennifer J. Loros, Jay C. Dunlap, Peter Ruoff

Dartmouth Scholarship

The temperature compensated in vitro oscillation of cyanobacterial KaiC phosphorylation, the first example of a thermodynamically closed system showing circadian rhythmicity, only involves the three Kai proteins (KaiA, KaiB, and KaiC) and ATP. In this paper, we describe a model in which the KaiA- and KaiB-assisted autocatalytic phosphorylation and dephosphorylation of KaiC are the source for circadian rhythmicity. This model, based upon autocatalysis instead of transcription-translation negative feedback, shows temperature-compensated circadian limit-cycle oscillations with KaiC phosphorylation profiles and has period lengths and rate constant values that are consistent with experimental observations.


Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice Jul 2006

Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice

Johns Hopkins University, Dept. of Biostatistics Working Papers

We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, …


Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry Jul 2006

Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications …


Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew Jun 2006

Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew

Johns Hopkins University, Dept. of Biostatistics Working Papers

Through ‘combinatorial splicing’, RNA metabolism may create enormous structural diversity in the proteome. Functional interactions among multiple alternative domains can have a disproportionate impact on the phenotype, requiring integrated RNA-level regulation of molecular composition. Splicing correlations within molecules expressed from a single gene, where these effects would be greatest, provide valuable clues to functional relationships and targets for splicing regulation. We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in ‘clock plots’ and linkage grids. Higher-order correlations are assessed via a loglinear model and Monte Carlo analysis with an …


Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir Jun 2006

Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir

Harvard University Biostatistics Working Paper Series

No abstract provided.


Desulfovibrio Desulfuricans G20 Tetraheme Cytochrome Structure At 1.5 A˚ And Cytochrome Interaction With Metal Complexes, Mrunalini Pattarkine, J J. Tanner, C A. Bottoms, Y H. Lee, Judy D. Wall May 2006

Desulfovibrio Desulfuricans G20 Tetraheme Cytochrome Structure At 1.5 A˚ And Cytochrome Interaction With Metal Complexes, Mrunalini Pattarkine, J J. Tanner, C A. Bottoms, Y H. Lee, Judy D. Wall

Faculty Works

The structure of the type I tetraheme cytochrome c3 from Desulfovibrio desulfuricans G20 was determined to 1.5 A˚ by X-ray crystallography. In addition to the oxidized form, the structure of the molybdate-bound form of the protein was determined from oxidized crystals soaked in sodium molybdate. Only small structural shifts were obtained with metal binding, consistent with the remarkable structural stability of this protein. In vitro experiments with pure cytochrome showed that molybdate could oxidize the reduced cytochrome, although not as rapidly as U(VI) present as uranyl acetate. Alterations in the overall conformation and thermostability of the metal-oxidized protein were investigated …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur Apr 2006

Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur

Harvard University Biostatistics Working Paper Series

No abstract provided.


Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu Mar 2006

Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures.


Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes Mar 2006

Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes

UW Biostatistics Working Paper Series

Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify HIV positions at which the amino acids in sequences from infected vaccine recipients tend to be more divergent from the corresponding reference amino acid than the amino acids in sequences from infected placebo recipients. We consider five two-sample test statistics, based on Euclidean, Mahalanobis, and Kullback-Leibler divergence measures. Weights are incorporated to reflect biological information contained in …


2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr Mar 2006

2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr

UW Biostatistics Working Paper Series

When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …


Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Feb 2006

Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …


Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber Feb 2006

Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber

Bioconductor Project Working Papers

The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data.


Multiple Sequence Alignment Accuracy And Phylogenetic Inference, T. Heath Ogden Dec 2005

Multiple Sequence Alignment Accuracy And Phylogenetic Inference, T. Heath Ogden

T. Heath Ogden

Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our …