Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

2007

Bioinformatics

Johns Hopkins University, Dept. of Biostatistics Working Papers

Articles 1 - 6 of 6

Full-Text Articles in Life Sciences

A Bayesian Model For Cross-Study Differential Gene Expression, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, Andrew B. Nobel Nov 2007

A Bayesian Model For Cross-Study Differential Gene Expression, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, Andrew B. Nobel

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies; flexible modeling that allows for interactions between platforms and the estimated effect, and for both concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-sample" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis but also under a …


Statistical Methods For The Analysis Of Cancer Genome Sequencing Data, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, B. Vogelstein Oct 2007

Statistical Methods For The Analysis Of Cancer Genome Sequencing Data, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, B. Vogelstein

Johns Hopkins University, Dept. of Biostatistics Working Papers

The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjoblom et al.[1]. In this context, we describe statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical signicance of the candidates thus identfied.


The Integrative Correlation Coefficient: A Measure Of Cross-Study Reproducibility For Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani May 2007

The Integrative Correlation Coefficient: A Measure Of Cross-Study Reproducibility For Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multi-study analysis adds value to microarray experiments. However, because of significant technical differences between microarray platforms, and because of differences in study design, it can be difficult to combine data. We have developed a statistical measure of reproducibility that can be applied to individual genes, measured in two different studies. This statistic, which we call the Integrative Correlation Coefficient or Correlation of Correlations, borrows strength across many genes to estimate the strength of the relationship between expression values in the two studies.


A Hidden Markov Model For Joint Estimation Of Genotype And Copy Number In High-Throughput Snp Chips, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, Ingo Ruczinski Feb 2007

A Hidden Markov Model For Joint Estimation Of Genotype And Copy Number In High-Throughput Snp Chips, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput …


Trab: Testing Whether Mutation Frequencies Are Above An Unknown Background, Giovanni Parmigiani, Sining Chen, Victor E. Velculescu Jan 2007

Trab: Testing Whether Mutation Frequencies Are Above An Unknown Background, Giovanni Parmigiani, Sining Chen, Victor E. Velculescu

Johns Hopkins University, Dept. of Biostatistics Working Papers

To rigorously determine whether a gene or a population of genes have alterations that are involved in carcinogenesis requires comparison of the prevalence of identified changes to the background mutation frequency present in tumor DNA. To facilitate this task, we develop a testing approach and the associated R library, called TRAB, that evaluates whether the frequency of somatic mutation is higher than an unknown, but estimable, background. We test the null hypothesis that the frequency belongs to background population of frequencies against the alternative hypothesis that the frequency is higher. Background mutation frequencies are themselves allowed to be variable. TRAB …


Optimized Cross-Study Analysis Of Microarray-Based Predictors, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani Jan 2007

Optimized Cross-Study Analysis Of Microarray-Based Predictors, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Background: Microarray-based gene expression analysis is widely used in cancer research to discover molecular signatures for cancer classification and prediction. In addition to numerous independent profiling projects, a number of investigators have analyzed multiple published data sets for purposes of cross-study validation. However, the diverse microarray platforms and technical approaches make direct comparisons across studies difficult, and without means to identify aberrant data patterns, less than optimal. To address this issue, we previously developed an integrative correlation approach to systematically address agreement of gene expression measurements across studies, providing a basis for cross-study validation analysis. Here we generalize this methodology …