Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

UPenn Biostatistics Working Papers

Articles 1 - 13 of 13

Full-Text Articles in Genetics and Genomics

Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li Jan 2010

Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li

UPenn Biostatistics Working Papers

Graphs and networks are common ways of depicting information. In biology, many different processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This information provides useful supplement to the standard numerical genomic data such as microarray gene expression data. Effectively utilizing such an information can lead to a better identification of biologically relevant genomic features in the context of our prior biological knowledge. In this paper, we present a Bayesian variable selection procedure for network-structured covariates for both Gaussian linear and probit models. The key of our approach is the introduction of a Markov …


A Hidden Markov Random Field Model For Genome-Wide Association Studies, Hongzhe Li, Zhi Wei, J M. Maris Jan 2009

A Hidden Markov Random Field Model For Genome-Wide Association Studies, Hongzhe Li, Zhi Wei, J M. Maris

UPenn Biostatistics Working Papers

Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior …


A Network-Constrained Empirical Bayes Method For Analysis Of Genomic Data, Caiyan Li, Zhi Wei, Hongzhe Li Oct 2008

A Network-Constrained Empirical Bayes Method For Analysis Of Genomic Data, Caiyan Li, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

Empirical Bayes methods are widely used in the analysis of microarray gene expression data in order to identify the differentially expressed genes or genes that are associated with other general phenotypes. Available methods often assume that genes are independent. However, genes are expected to function interactively and to form molecular modules to affect the phenotypes. In order to account for regulatory dependency among genes, we propose in this paper a network-constrained empirical Bayes method for analyzing genomic data in the framework of general linear models, where the dependency of genes is modeled by a discrete Markov random field model defined …


U-Statistics-Based Tests For Multiple Genes In Genetic Association Studies, Zhi Wei, Mingyao Li Phd, Timothy Rebbeck, Hongzhe Li Apr 2008

U-Statistics-Based Tests For Multiple Genes In Genetic Association Studies, Zhi Wei, Mingyao Li Phd, Timothy Rebbeck, Hongzhe Li

UPenn Biostatistics Working Papers

Abstract: As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive …


Incorporation Of Genetic Pathway Information Into Analysis Of Multivariate Gene Expression Data, Zhi Wei, Jane E. Minturn, Eric Rappaport, Garrett Brodeur, Hongzhe Li Apr 2008

Incorporation Of Genetic Pathway Information Into Analysis Of Multivariate Gene Expression Data, Zhi Wei, Jane E. Minturn, Eric Rappaport, Garrett Brodeur, Hongzhe Li

UPenn Biostatistics Working Papers

Abstract: Multivariate microarray gene expression data are commonly collected to study the genomic responses under ordered conditions such as over increasing/decreasing dose levels or over time during biological processes. One important question from such multivariate gene expression experiments is to identify genes that show different expression patterns over treatment dosages or over time and pathways that are perturbed during a given biological process. In this paper, we develop a hidden Markov random field model for multivariate expression data in order to identify genes and subnetworks that are related to biological processes, where the dependency of the differential expression patterns of …


Network-Constrained Regularization And Variable Selection For Analysis Of Genomic Data, Caiyan Li, Hongzhe Li Dec 2007

Network-Constrained Regularization And Variable Selection For Analysis Of Genomic Data, Caiyan Li, Hongzhe Li

UPenn Biostatistics Working Papers

Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of {\it a priori} information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this paper, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these …


Vertex Clustering In Random Graphs Via Reversible Jump Markov Chain Monte Carlo, Stefano Monni, Hongzhe Li Dec 2007

Vertex Clustering In Random Graphs Via Reversible Jump Markov Chain Monte Carlo, Stefano Monni, Hongzhe Li

UPenn Biostatistics Working Papers

Networks are a natural and effective tool to study relational data, in which observations are collected on pairs of units. The units are represented by nodes and their relations by edges. In biology, for example, proteins and their interactions, and, in social science, people and inter-personal relations may be the nodes and the edges of the network. In this paper we address the question of clustering vertices in networks, as a way to uncover homogeneity patterns in data that enjoy a network representation. We use a mixture model for random graphs and propose a reversible jump Markov chain Monte Carlo …


A Hidden Spatial-Temporal Markov Random Field Model For Network-Based Analysis Of Time Course Gene Expression Data, Zhi Wei, Hongzhe Li Oct 2007

A Hidden Spatial-Temporal Markov Random Field Model For Network-Based Analysis Of Time Course Gene Expression Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

Microarray time course (MTC) gene expression data are commonly collected to study the dynamic nature of biological processes. One important problem is to identify genes that show different expression profiles over time and pathways that are perturbed during a given biological process. While methods are available to identify the genes with differential expression levels over time, there is a lack of methods that can incorporate the pathway information in identifying the pathways being modified/activated during a biological process. In this paper, we develop a hidden spatial-temporal Markov random field (hstMRF)-based method for identifying genes and subnetworks that are related to …


A Markov Random Field Model For Network-Based Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Mar 2007

A Markov Random Field Model For Network-Based Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

A central problem in genomic research is the identification of genes and pathways involved in diseases and other biological processes. The genes identified or the univariate test statistics are often linked to known biological pathways through gene set enrichment analysis in order to identify the pathways involved. However, most of the procedures for identifying differentially expressed genes do not utilize the known pathway information in the phase of identifying such genes. In this paper, we develop a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Such a procedure models the dependency of the …


Statistical Methods For Inference Of Genetic Networks And Regulatory Modules, Hongzhe Li Mar 2007

Statistical Methods For Inference Of Genetic Networks And Regulatory Modules, Hongzhe Li

UPenn Biostatistics Working Papers

Large-scale microarray gene expression data, motif data derived from promotor sequences, genome-wide chromatin immunoprecipitation (ChIP-chip) data, DNA polymorphism data and epigenomic data provide the possibility of constructing genetic networks or biological pathways, especially regulatory networks. In this paper, we review some new statistical methods for inference of genetic networks and regulatory modules, including a threshold gradient descent procedure for inference of Gaussian graphical models, a sparse regression mixture modeling approach for inference of regulatory modules, and the varying coefficient model for identifying regulatory subnetworks by integrating microarray time-course gene expression data and motif or ChIP-chip data. We present the statistical …


Group Scad Regression Analysis For Microarray Time Course Gene Expression Data, Lifeng Wang, Guang Chen, Hongzhe Li Phd Jan 2007

Group Scad Regression Analysis For Microarray Time Course Gene Expression Data, Lifeng Wang, Guang Chen, Hongzhe Li Phd

UPenn Biostatistics Working Papers

Since many important biological systems or processes are dynamic systems, it is important to study the gene expression patterns over time in a genomic scale in order to capture the dynamic behavior of gene expression. Microarray technologies have made it possible to measure the gene expression levels of essentially all the genes during a given biological process. In order to determine the transcriptional factors involved in gene regulation during a given biological process, we propose to develop a functional response model with varying coefficients in order to model the transcriptional effects on gene expression levels and to develop a group …


Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li Aug 2006

Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li

UPenn Biostatistics Working Papers

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …


Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Feb 2006

Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

UPenn Biostatistics Working Papers

High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …