Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

Johns Hopkins University, Dept. of Biostatistics Working Papers

Discipline
Keyword
Publication Year

Articles 1 - 30 of 39

Full-Text Articles in Genetics and Genomics

Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale Jun 2011

Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale

Johns Hopkins University, Dept. of Biostatistics Working Papers

Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …


Removing Technical Variability In Rna-Seq Data Using Conditional Quantile Normalization, Kasper D. Hansen, Rafael A. Irizarry, Zhijin Wu May 2011

Removing Technical Variability In Rna-Seq Data Using Conditional Quantile Normalization, Kasper D. Hansen, Rafael A. Irizarry, Zhijin Wu

Johns Hopkins University, Dept. of Biostatistics Working Papers

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a …


Using The R Package Crlmm For Genotyping And Copy Number Estimation, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, Ingo Ruczinski Sep 2010

Using The R Package Crlmm For Genotyping And Copy Number Estimation, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for …


A Decision-Theory Approach To Interpretable Set Analysis For High-Dimensional Data, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani Jul 2010

A Decision-Theory Approach To Interpretable Set Analysis For High-Dimensional Data, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

A ubiquitous problem in igh-dimensional analysis is the identification of pre-defined sets that are enriched for features showing an association of interest. In this situation, inference is performed on sets, not individual features. We propose an approach which focuses on estimating the fraction of non-null features in a set. We search for unions of disjoint sets (atoms), using as the loss function a weighted average of the number of false and missed discoveries. We prove that the solution is equivalent to thresholding the atomic false discovery rate and that our approach results in a more interpretable set analysis.


Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry Mar 2010

Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray pre-processing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy …


Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry Feb 2010

Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

For a better understanding of the biology of an organism a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays can be used for this purpose. Such arrays allow the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. Much of the initial methodological efforts were designed for transcript discovery, while more recent developments also focus on differential expression. To our knowledge no methods for tiling arrays are described in the literature that can both assess transcript discovery and identify …


Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo Sep 2009

Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo

Johns Hopkins University, Dept. of Biostatistics Working Papers

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, and is capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1,000 Genomes Project, plans to fully sequence the genomes of approximately 1,200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings …


Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski Jul 2009

Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the …


Subset Quantile Normalization Using Negative Control Features, Zhijin Wu Jun 2009

Subset Quantile Normalization Using Negative Control Features, Zhijin Wu

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Frozen Robust Multi-Array Analysis (Frma), Matthew N. Mccall, Benjamin M. Bolstad, Rafael A. Irizarry May 2009

Frozen Robust Multi-Array Analysis (Frma), Matthew N. Mccall, Benjamin M. Bolstad, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Robust Multi-array Analysis (RMA) is the most widely used preprocessing algorithm for Affymetrix and Nimblegen gene-expression microarrays. RMA performs background correction, normalization, and summarization in a modular way. The last two steps require multiple arrays to be analyzed simultaneously. The ability to borrow information across samples provides RMA various advantages. For example, the summarization step fits a parametric model that accounts for probe-effects, assumed to be fixed across arrays, and improves outlier detection. Residuals, obtained from the fitted model, permit the creation of useful quality metrics. However, the dependence on multiple arrays has two drawbacks: (1) RMA can- not be …


Gene Set Enrichment Analysis Made Simple, Rafael A. Irizarry, Chi Wang, Yun Zhou, Terence P. Speed Apr 2009

Gene Set Enrichment Analysis Made Simple, Rafael A. Irizarry, Chi Wang, Yun Zhou, Terence P. Speed

Johns Hopkins University, Dept. of Biostatistics Working Papers

Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis …


Generalized Liquid Association, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, Giovanni Parmigiani Apr 2009

Generalized Liquid Association, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

The analysis of interactions among a group of genes is fundamental to fur- ther our understanding of their biological interactions in a cell. Several studies suggested that the co-expression relationship of two genes can be modulated by a third controller gene. These controller genes and the corresponding modulated co-expressed gene pairs are the subjects of interests in this study. This described \controller-modulated genes" three-way interactions is referred as liquid association in the literature. Analysis of gene expression data has suggested that these interactions are present in many biological systems.

To quantify the magnitude of liquid association for a given gene …


Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis Jan 2009

Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis

Johns Hopkins University, Dept. of Biostatistics Working Papers

High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show …


Likelihood Estimation Of Conjugacy Relationships In Linear Models With Applications To High-Throughput Genomics, Brian S. Caffo, Liu Dongmei, Robert Scharpf, Giovanni Parmigiani Apr 2008

Likelihood Estimation Of Conjugacy Relationships In Linear Models With Applications To High-Throughput Genomics, Brian S. Caffo, Liu Dongmei, Robert Scharpf, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately …


Design And Analysis Issues In Genome-Wide Somatic Mutation Studies Of Cancer, Giovanni Parmigiani, Simina Boca, Jimmy Lin, Kenneth W. Kinzler, Victor E. Velculescu, Bert Vogelstein Jan 2008

Design And Analysis Issues In Genome-Wide Somatic Mutation Studies Of Cancer, Giovanni Parmigiani, Simina Boca, Jimmy Lin, Kenneth W. Kinzler, Victor E. Velculescu, Bert Vogelstein

Johns Hopkins University, Dept. of Biostatistics Working Papers

The availability of the human genome sequence and progress in sequencing and bioinformatic technologies have enabled genome-wide investigation of somatic mu- tations in human cancers. This article briefly reviews challenges arising in the statistical analysis of mutational data of this kind. A first challenge is that of designing studies that efficiently allocate sequencing resources. We show that this can be addressed by two-stage designs, and demonstrate via simulations that even relatively small studies can produce lists of candidate cancer genes that are highly informative for future research efforts. A second challenge is to distinguish mutated genes that are selected for …


A Bayesian Model For Cross-Study Differential Gene Expression, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, Andrew B. Nobel Nov 2007

A Bayesian Model For Cross-Study Differential Gene Expression, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, Andrew B. Nobel

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies; flexible modeling that allows for interactions between platforms and the estimated effect, and for both concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-sample" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis but also under a …


Statistical Methods For The Analysis Of Cancer Genome Sequencing Data, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, B. Vogelstein Oct 2007

Statistical Methods For The Analysis Of Cancer Genome Sequencing Data, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, B. Vogelstein

Johns Hopkins University, Dept. of Biostatistics Working Papers

The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjoblom et al.[1]. In this context, we describe statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical signicance of the candidates thus identfied.


The Integrative Correlation Coefficient: A Measure Of Cross-Study Reproducibility For Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani May 2007

The Integrative Correlation Coefficient: A Measure Of Cross-Study Reproducibility For Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multi-study analysis adds value to microarray experiments. However, because of significant technical differences between microarray platforms, and because of differences in study design, it can be difficult to combine data. We have developed a statistical measure of reproducibility that can be applied to individual genes, measured in two different studies. This statistic, which we call the Integrative Correlation Coefficient or Correlation of Correlations, borrows strength across many genes to estimate the strength of the relationship between expression values in the two studies.


A Hidden Markov Model For Joint Estimation Of Genotype And Copy Number In High-Throughput Snp Chips, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, Ingo Ruczinski Feb 2007

A Hidden Markov Model For Joint Estimation Of Genotype And Copy Number In High-Throughput Snp Chips, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput …


Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani Feb 2007

Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if …


Trab: Testing Whether Mutation Frequencies Are Above An Unknown Background, Giovanni Parmigiani, Sining Chen, Victor E. Velculescu Jan 2007

Trab: Testing Whether Mutation Frequencies Are Above An Unknown Background, Giovanni Parmigiani, Sining Chen, Victor E. Velculescu

Johns Hopkins University, Dept. of Biostatistics Working Papers

To rigorously determine whether a gene or a population of genes have alterations that are involved in carcinogenesis requires comparison of the prevalence of identified changes to the background mutation frequency present in tumor DNA. To facilitate this task, we develop a testing approach and the associated R library, called TRAB, that evaluates whether the frequency of somatic mutation is higher than an unknown, but estimable, background. We test the null hypothesis that the frequency belongs to background population of frequencies against the alternative hypothesis that the frequency is higher. Background mutation frequencies are themselves allowed to be variable. TRAB …


Optimized Cross-Study Analysis Of Microarray-Based Predictors, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani Jan 2007

Optimized Cross-Study Analysis Of Microarray-Based Predictors, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Background: Microarray-based gene expression analysis is widely used in cancer research to discover molecular signatures for cancer classification and prediction. In addition to numerous independent profiling projects, a number of investigators have analyzed multiple published data sets for purposes of cross-study validation. However, the diverse microarray platforms and technical approaches make direct comparisons across studies difficult, and without means to identify aberrant data patterns, less than optimal. To address this issue, we previously developed an integrative correlation approach to systematically address agreement of gene expression measurements across studies, providing a basis for cross-study validation analysis. Here we generalize this methodology …


Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman Dec 2006

Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of …


Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann Nov 2006

Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann

Johns Hopkins University, Dept. of Biostatistics Working Papers

We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …


Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry Oct 2006

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …


Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice Jul 2006

Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice

Johns Hopkins University, Dept. of Biostatistics Working Papers

We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, …


Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry Jul 2006

Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications …


Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew Jun 2006

Multivariate Analysis And Visualization Of Splicing Correlations In Single-Gene Transcriptomes, Mark C. Emerick, Giovanni Parmigiani, William S. Agnew

Johns Hopkins University, Dept. of Biostatistics Working Papers

Through ‘combinatorial splicing’, RNA metabolism may create enormous structural diversity in the proteome. Functional interactions among multiple alternative domains can have a disproportionate impact on the phenotype, requiring integrated RNA-level regulation of molecular composition. Splicing correlations within molecules expressed from a single gene, where these effects would be greatest, provide valuable clues to functional relationships and targets for splicing regulation. We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in ‘clock plots’ and linkage grids. Higher-order correlations are assessed via a loglinear model and Monte Carlo analysis with an …


Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman Mar 2006

Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this …


Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu Mar 2006

Feature-Level Exploration Of The Choe Et Al. Affymetrix Genechip Control Dataset, Rafael A. Irizarry, Leslie Cope, Zhijin Wu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We describe why the Choe et al. control dataset should not be used to assess GeneChip expression measures.