Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

2006

Discipline
Keyword
Publication

Articles 1 - 27 of 27

Full-Text Articles in Genetics and Genomics

Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman Dec 2006

Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of …


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann Nov 2006

Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann

Johns Hopkins University, Dept. of Biostatistics Working Papers

We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry Oct 2006

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


A Unifying Approach For Haplotype Analysis Of Quantitative Traits In Family-Based Association Studies: Testing And Estimating Gene-Environment Interactions With Complex Exposure Variables, Stijn Vansteelandt, Christoph Lange Sep 2006

A Unifying Approach For Haplotype Analysis Of Quantitative Traits In Family-Based Association Studies: Testing And Estimating Gene-Environment Interactions With Complex Exposure Variables, Stijn Vansteelandt, Christoph Lange

COBRA Preprint Series

We propose robust and e±cient tests and estimators for gene-environment/gene-drug interactions in family-based association studies. The methodology is designed for studies in which haplotypes, quantitative pheno- types and complex exposure/treatment variables are analyzed. Using causal inference methodology, we derive family-based association tests and estimators for the genetic main effects and the interactions. The tests and estimators are robust against population admixture and strati¯cation without requiring adjustment for confounding variables. We illustrate the practical relevance of our approach by an application to a COPD study. The data analysis suggests a gene-environment interaction between a SNP in the Serpine gene and smok- …


Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng Aug 2006

Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng

Harvard University Biostatistics Working Paper Series

No abstract provided.


Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin Aug 2006

Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin Aug 2006

Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin Aug 2006

Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin Aug 2006

A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li Aug 2006

Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li

UPenn Biostatistics Working Papers

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …


Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman Aug 2006

Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman

Bioconductor Project Working Papers

Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper.

Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting …


Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice Jul 2006

Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice

Johns Hopkins University, Dept. of Biostatistics Working Papers

We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, …


Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry Jul 2006

Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications …


Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew Jun 2006

Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew

Johns Hopkins University, Dept. of Biostatistics Working Papers

Through ‘combinatorial splicing’, RNA metabolism may create enormous structural diversity in the proteome. Functional interactions among multiple alternative domains can have a disproportionate impact on the phenotype, requiring integrated RNA-level regulation of molecular composition. Splicing correlations within molecules expressed from a single gene, where these effects would be greatest, provide valuable clues to functional relationships and targets for splicing regulation. We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in ‘clock plots’ and linkage grids. Higher-order correlations are assessed via a loglinear model and Monte Carlo analysis with an …


A Faster Circular Binary Segmentation Algorithm For The Analysis Of Array Cgh Data, E S. Venkatraman, Adam Olshen Jun 2006

A Faster Circular Binary Segmentation Algorithm For The Analysis Of Array Cgh Data, E S. Venkatraman, Adam Olshen

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the …


Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir Jun 2006

Plasq: A Generalized Linear Model-Based Procedure To Determine Allelic Dosage Ini Cancer Cells From Snp Array Data, Thomas Laframboise, David P. Harrington, Barbara A. Weir

Harvard University Biostatistics Working Paper Series

No abstract provided.


Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur Apr 2006

Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur

Harvard University Biostatistics Working Paper Series

No abstract provided.


Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman Mar 2006

Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this …


Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu Mar 2006

Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures.


Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes Mar 2006

Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes

UW Biostatistics Working Paper Series

Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify HIV positions at which the amino acids in sequences from infected vaccine recipients tend to be more divergent from the corresponding reference amino acid than the amino acids in sequences from infected placebo recipients. We consider five two-sample test statistics, based on Euclidean, Mahalanobis, and Kullback-Leibler divergence measures. Weights are incorporated to reflect biological information contained in …


2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr Mar 2006

2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr

UW Biostatistics Working Paper Series

When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …


Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan Mar 2006

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …


Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Feb 2006

Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …


Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber Feb 2006

Visualizing Genomic Data, Robert Gentleman, Florian Hahne, Wolfgang Huber

Bioconductor Project Working Papers

The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data.