Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

Selected Works

Computational Biology/Bioinformatics

Articles 1 - 6 of 6

Full-Text Articles in Life Sciences

Software To Accompany "A Kernel Machine Approach For Differential Expression Analysis Of Mass Spectrometry-Based Metabolomics Data", Debashis Ghosh Jan 2014

Software To Accompany "A Kernel Machine Approach For Differential Expression Analysis Of Mass Spectrometry-Based Metabolomics Data", Debashis Ghosh

Debashis Ghosh

No abstract provided.


Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan Mar 2012

Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan

Paul H. Chaffee

Many diseases and other important phenotypic outcomes are the result of a combination of factors. For example, expression levels of genes have been used as input to various statistical methods for predicting phenotypic outcomes. One particular popular variety is the so-called gene set enrichment analysis (GSEA). This paper discusses an augmentation to an existing strategy to estimate the significance of an associations between a disease outcome and a predetermined combination of biological factors, based on a specific data adaptive regression method (the "Super Learner," van der Laan et al., 2007). The procedure uses an aggressive search procedure, potentially resulting in …


Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer Dec 2010

Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer

Mark R Segal

Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied. In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise. Implicit in their usage is that these domains have no “holes”—hereafter “exclusion zones”—regions in which events a priori cannot occur. However, in many contexts, this requirement is not met. When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply adjusting the …


Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal Dec 2008

Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal

Mark R Segal

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression …


Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li Oct 2006

Nonparametric Pathway-Based Regression Models For Analysis Of Genomic Data, Zhi Wei, Hongzhe Li

Hongzhe Li

High-throughout genomic data provide an opportunity for identifying pathways and genes that are related to various clinical phenotypes. Besides these genomic data, another valuable source of data is the biological knowledge about genes and pathways that might be related to the phenotypes of many complex diseases. Databases of such knowledge are often called the metadata. In microarray data analysis, such metadata are currently explored in post hoc ways by gene set enrichment analysis but have hardly been utilized in the modeling step. We propose to develop and evaluate a pathway-based gradient descent boosting procedure for nonparametric pathways-based regression(NPR) analysis to …


Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li Oct 2006

Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li

Hongzhe Li

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …