Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 23 of 23

Full-Text Articles in Genetics and Genomics

Network-Constrained Regularization And Variable Selection For Analysis Of Genomic Data, Caiyan Li, Hongzhe Li Dec 2007

Network-Constrained Regularization And Variable Selection For Analysis Of Genomic Data, Caiyan Li, Hongzhe Li

UPenn Biostatistics Working Papers

Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of {\it a priori} information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this paper, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these …


Vertex Clustering In Random Graphs Via Reversible Jump Markov Chain Monte Carlo, Stefano Monni, Hongzhe Li Dec 2007

Vertex Clustering In Random Graphs Via Reversible Jump Markov Chain Monte Carlo, Stefano Monni, Hongzhe Li

UPenn Biostatistics Working Papers

Networks are a natural and effective tool to study relational data, in which observations are collected on pairs of units. The units are represented by nodes and their relations by edges. In biology, for example, proteins and their interactions, and, in social science, people and inter-personal relations may be the nodes and the edges of the network. In this paper we address the question of clustering vertices in networks, as a way to uncover homogeneity patterns in data that enjoy a network representation. We use a mixture model for random graphs and propose a reversible jump Markov chain Monte Carlo …


A Bayesian Model For Cross-Study Differential Gene Expression, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, Andrew B. Nobel Nov 2007

A Bayesian Model For Cross-Study Differential Gene Expression, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, Andrew B. Nobel

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies; flexible modeling that allows for interactions between platforms and the estimated effect, and for both concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-sample" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis but also under a …


Assessing Population Level Genetic Instability Via Moving Average, Samuel Mcdaniel, Rebecca Betensky, Tianxi Cai Nov 2007

Assessing Population Level Genetic Instability Via Moving Average, Samuel Mcdaniel, Rebecca Betensky, Tianxi Cai

Harvard University Biostatistics Working Paper Series

No abstract provided.


Statistical Methods For The Analysis Of Cancer Genome Sequencing Data, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, B. Vogelstein Oct 2007

Statistical Methods For The Analysis Of Cancer Genome Sequencing Data, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, B. Vogelstein

Johns Hopkins University, Dept. of Biostatistics Working Papers

The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjoblom et al.[1]. In this context, we describe statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical signicance of the candidates thus identfied.


A Hidden Spatial-Temporal Markov Random Field Model For Network-Based Analysis Of Time Course Gene Expression Data, Zhi Wei, Hongzhe Li Oct 2007

A Hidden Spatial-Temporal Markov Random Field Model For Network-Based Analysis Of Time Course Gene Expression Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

Microarray time course (MTC) gene expression data are commonly collected to study the dynamic nature of biological processes. One important problem is to identify genes that show different expression profiles over time and pathways that are perturbed during a given biological process. While methods are available to identify the genes with differential expression levels over time, there is a lack of methods that can incorporate the pathway information in identifying the pathways being modified/activated during a biological process. In this paper, we develop a hidden spatial-temporal Markov random field (hstMRF)-based method for identifying genes and subnetworks that are related to …


Assessment Of A Cgh-Based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, Rebecca A. Betensky Jul 2007

Assessment Of A Cgh-Based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, Rebecca A. Betensky

Harvard University Biostatistics Working Paper Series

No abstract provided.


Survival Analysis With Large Dimensional Covariates: An Application In Microarray Studies, David A. Engler, Yi Li Jul 2007

Survival Analysis With Large Dimensional Covariates: An Application In Microarray Studies, David A. Engler, Yi Li

Harvard University Biostatistics Working Paper Series

Use of microarray technology often leads to high-dimensional and low- sample size data settings. Over the past several years, a variety of novel approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptation of the elastic net approach is presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time …


The Integrative Correlation Coefficient: A Measure Of Cross-Study Reproducibility For Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani May 2007

The Integrative Correlation Coefficient: A Measure Of Cross-Study Reproducibility For Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multi-study analysis adds value to microarray experiments. However, because of significant technical differences between microarray platforms, and because of differences in study design, it can be difficult to combine data. We have developed a statistical measure of reproducibility that can be applied to individual genes, measured in two different studies. This statistic, which we call the Integrative Correlation Coefficient or Correlation of Correlations, borrows strength across many genes to estimate the strength of the relationship between expression values in the two studies.


What Is The Best Reference Rna? And Other Questions Regarding The Design And Analysis Of Two-Color Microarray Experiments, Kathleen F. Kerr, Kyle A. Serikawa, Caimiao Wei, Mette A. Peters, Roger E. Bumgarner Apr 2007

What Is The Best Reference Rna? And Other Questions Regarding The Design And Analysis Of Two-Color Microarray Experiments, Kathleen F. Kerr, Kyle A. Serikawa, Caimiao Wei, Mette A. Peters, Roger E. Bumgarner

UW Biostatistics Working Paper Series

The reference design is a practical and popular choice for microarray studies using two-color platforms. In the reference design, the reference RNA uses half of all array resources, leading investigators to ask: What is the best reference RNA? We propose a novel method for evaluating reference RNAs and present the results of an experiment that was specially designed to evaluate three common choices of reference RNA. We found no compelling evidence in favor of any particular reference. In particular, a commercial reference showed no advantage in our data. Our experimental design also enabled a new way to test the effectiveness …


A Markov Random Field Model For Network-Based Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Mar 2007

A Markov Random Field Model For Network-Based Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

A central problem in genomic research is the identification of genes and pathways involved in diseases and other biological processes. The genes identified or the univariate test statistics are often linked to known biological pathways through gene set enrichment analysis in order to identify the pathways involved. However, most of the procedures for identifying differentially expressed genes do not utilize the known pathway information in the phase of identifying such genes. In this paper, we develop a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Such a procedure models the dependency of the …


Statistical Methods For Inference Of Genetic Networks And Regulatory Modules, Hongzhe Li Mar 2007

Statistical Methods For Inference Of Genetic Networks And Regulatory Modules, Hongzhe Li

UPenn Biostatistics Working Papers

Large-scale microarray gene expression data, motif data derived from promotor sequences, genome-wide chromatin immunoprecipitation (ChIP-chip) data, DNA polymorphism data and epigenomic data provide the possibility of constructing genetic networks or biological pathways, especially regulatory networks. In this paper, we review some new statistical methods for inference of genetic networks and regulatory modules, including a threshold gradient descent procedure for inference of Gaussian graphical models, a sparse regression mixture modeling approach for inference of regulatory modules, and the varying coefficient model for identifying regulatory subnetworks by integrating microarray time-course gene expression data and motif or ChIP-chip data. We present the statistical …


Conservative Estimation Of Optimal Multiple Testing Procedures, James E. Signorovitch Mar 2007

Conservative Estimation Of Optimal Multiple Testing Procedures, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

No abstract provided.


Statistical Evaluation Of Evidence For Clonal Allelic Alterations In Array-Cgh Experiments, Colin B. Begg, Kevin Eng, Adam Olshen, E S. Venkatraman Mar 2007

Statistical Evaluation Of Evidence For Clonal Allelic Alterations In Array-Cgh Experiments, Colin B. Begg, Kevin Eng, Adam Olshen, E S. Venkatraman

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

In recent years numerous investigators have conducted genetic studies of pairs of tumor specimens from the same patient to determine whether the tumors share a clonal origin. These studies have the potential to be of considerable clinical significance, especially in clinical settings where the distinction of a new primary cancer and metastatic spread of a previous cancer would lead to radically different indications for treatment. Studies of clonality have typically involved comparison of the patterns of somatic mutations in the tumors at candidate genetic loci to see if the patterns are sufficiently similar to indicate a clonal origin. More recently, …


Sequential Quantitative Trait Locus Mapping In Experimental Crosses, Jaya M. Satagopan, Saunak Sen, Gary A. Churchill Mar 2007

Sequential Quantitative Trait Locus Mapping In Experimental Crosses, Jaya M. Satagopan, Saunak Sen, Gary A. Churchill

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

The etiology of complex diseases is heterogeneous. The presence of risk alleles in one or more genetic loci affects the function of a variety of intermediate biological pathways, resulting in the overt expression of disease. Hence, there is an increasing focus on identifying the genetic basis of disease by sytematically studying phenotypic traits pertaining to the underlying biological functions. In this paper we focus on identifying genetic loci linked to quantitative phenotypic traits in experimental crosses. Such genetic mapping methods often use a one stage design by genotyping all the markers of interest on the available subjects. A genome scan …


A Hidden Markov Model For Joint Estimation Of Genotype And Copy Number In High-Throughput Snp Chips, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, Ingo Ruczinski Feb 2007

A Hidden Markov Model For Joint Estimation Of Genotype And Copy Number In High-Throughput Snp Chips, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput …


Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani Feb 2007

Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if …


Power Boosting In Genome-Wide Studies Via Methods For Multivariate Outcomes, Mary J. Emond Feb 2007

Power Boosting In Genome-Wide Studies Via Methods For Multivariate Outcomes, Mary J. Emond

UW Biostatistics Working Paper Series

Whole-genome studies are becoming a mainstay of biomedical research. Examples include expression array experiments, comparative genomic hybridization analyses and large case-control studies for detecting polymorphism/disease associations. The tactic of applying a regression model to every locus to obtain test statistics is useful in such studies. However, this approach ignores potential correlation structure in the data that could be used to gain power, particularly when a Bonferroni correction is applied to adjust for multiple testing. In this article, we propose using regression techniques for misspecified multivariate outcomes to increase statistical power over independence-based modeling at each locus. Even when the outcome …


Data Quality Assessment Of Ungated Flow Cytometry Data In High, Nolwenn Le Meur, Anthony Rossini, Maura Gasparetto, Clay Smith, Ryan R. Brinkman, Robert Gentleman Feb 2007

Data Quality Assessment Of Ungated Flow Cytometry Data In High, Nolwenn Le Meur, Anthony Rossini, Maura Gasparetto, Clay Smith, Ryan R. Brinkman, Robert Gentleman

Bioconductor Project Working Papers

Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data.

Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized …


Group Scad Regression Analysis For Microarray Time Course Gene Expression Data, Lifeng Wang, Guang Chen, Hongzhe Li Phd Jan 2007

Group Scad Regression Analysis For Microarray Time Course Gene Expression Data, Lifeng Wang, Guang Chen, Hongzhe Li Phd

UPenn Biostatistics Working Papers

Since many important biological systems or processes are dynamic systems, it is important to study the gene expression patterns over time in a genomic scale in order to capture the dynamic behavior of gene expression. Microarray technologies have made it possible to measure the gene expression levels of essentially all the genes during a given biological process. In order to determine the transcriptional factors involved in gene regulation during a given biological process, we propose to develop a functional response model with varying coefficients in order to model the transcriptional effects on gene expression levels and to develop a group …


Trab: Testing Whether Mutation Frequencies Are Above An Unknown Background, Giovanni Parmigiani, Sining Chen, Victor E. Velculescu Jan 2007

Trab: Testing Whether Mutation Frequencies Are Above An Unknown Background, Giovanni Parmigiani, Sining Chen, Victor E. Velculescu

Johns Hopkins University, Dept. of Biostatistics Working Papers

To rigorously determine whether a gene or a population of genes have alterations that are involved in carcinogenesis requires comparison of the prevalence of identified changes to the background mutation frequency present in tumor DNA. To facilitate this task, we develop a testing approach and the associated R library, called TRAB, that evaluates whether the frequency of somatic mutation is higher than an unknown, but estimable, background. We test the null hypothesis that the frequency belongs to background population of frequencies against the alternative hypothesis that the frequency is higher. Background mutation frequencies are themselves allowed to be variable. TRAB …


Optimized Cross-Study Analysis Of Microarray-Based Predictors, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani Jan 2007

Optimized Cross-Study Analysis Of Microarray-Based Predictors, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Background: Microarray-based gene expression analysis is widely used in cancer research to discover molecular signatures for cancer classification and prediction. In addition to numerous independent profiling projects, a number of investigators have analyzed multiple published data sets for purposes of cross-study validation. However, the diverse microarray platforms and technical approaches make direct comparisons across studies difficult, and without means to identify aberrant data patterns, less than optimal. To address this issue, we previously developed an integrative correlation approach to systematically address agreement of gene expression measurements across studies, providing a basis for cross-study validation analysis. Here we generalize this methodology …


Improving Gsea For Analysis Of Biologic Pathways For Differential Gene Expression Across A Binary Phenotype , Irina Dinu, John D. Potter, Thomas Mueller, Qi Liu, Adeniyi J. Adewale, Gian S. Jhangri, Gunilla Einecke, Konrad S. Famulski, Philip Halloran, Yutaka Yasui Jan 2007

Improving Gsea For Analysis Of Biologic Pathways For Differential Gene Expression Across A Binary Phenotype , Irina Dinu, John D. Potter, Thomas Mueller, Qi Liu, Adeniyi J. Adewale, Gian S. Jhangri, Gunilla Einecke, Konrad S. Famulski, Philip Halloran, Yutaka Yasui

COBRA Preprint Series

Gene-set analysis evaluates the expression of biological pathways, or a priori defined gene sets, rather than that of single genes, in association with a binary phenotype, and is of great biologic interest in many DNA microarray studies. Gene Set Enrichment Analysis (GSEA) has been applied widely as a tool for gene-set analyses. We describe here some critical problems with GSEA and propose an alternative method by extending the single-gene analysis method, Significance Analysis of Microarray (SAM), to gene-set analyses (SAM-GS). Specifically, we illustrate, in a simulation study, that GSEA gives statistical significance to gene sets that have no gene associated …