Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Life Sciences

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh Jan 2012

James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. Based on a spacings theory representation of the B-H procedure, we are able to motivate the use of shrinkage estimators for modifying the B-H procedure. Several generalizations in the paper are discussed, and the methodology is applied to real and simulated datasets.


Shrinkage In Adaptive Procedures For False Discovery Rate Estimation In Multiple Testing: Structure And Synthesis, Debashis Ghosh Jan 2012

Shrinkage In Adaptive Procedures For False Discovery Rate Estimation In Multiple Testing: Structure And Synthesis, Debashis Ghosh

Debashis Ghosh

There has been much interest in the study of adaptive estimation procedures for controlling the false discovery rate (FDR). In this article, we take the direct approach to estimation of FDR of Storey (2002) and show how it can reexpressed as a particular type of shrinkage estimator. This representation leads to natural conditions on finite-sample FDR control for a general class of shrinkage estimators. In addition, many previous proposals from the literature can be unified under this framework for which finite-sample FDR results can be developed. Some asymptotic results are also provided.


Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh Jan 2011

Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. We show how the B-H procedure can be interpreted as a test based on the spacings corresponding to the p-value distributions. Using this equivalence, we develop a class of generalized B-H procedures that maintain control of the false discovery rate in finite-samples. We also consider the effect of correlation on the procedure; simulation studies are used to illustrate the methodology.


Software For Assumption Weighting For Meta-Analysis Of Genomic Data, Debashis Ghosh, Yihan Li Jan 2011

Software For Assumption Weighting For Meta-Analysis Of Genomic Data, Debashis Ghosh, Yihan Li

Debashis Ghosh

This is the software that accompanies Li and Ghosh, "Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data."


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2010

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression mi- croarrays, dierential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of dierential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing …


Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh Jan 2010

Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh

Debashis Ghosh

A recent nding in cancer research has been the characterization of previously undis- covered chromosomal abnormalities in several types of solid tumors. This was found based on analyses of high-throughput data from gene expression microarrays and motivated the development of so-called `outlier' tests for dierential expression. One statistical issue was the potential discreteness of the test statistics. Using ideas from fuzzy set theory, we develop fuzzy outlier detection algorithms that have links to ideas in multiple comparisons. Two- and K-sample extensions are considered. The methodology is illustrated by application to two microarray studies.


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2009

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression microarrays, differential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of differential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing procedures …


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2009

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression microarrays, differential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of differential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing procedures …


Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal Dec 2008

Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal

Mark R Segal

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression …