Open Access. Powered by Scholars. Published by Universities.®

Medical Biomathematics and Biometrics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Computational Biology/Bioinformatics

Articles 1 - 12 of 12

Full-Text Articles in Medical Biomathematics and Biometrics

James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh Jan 2012

James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. Based on a spacings theory representation of the B-H procedure, we are able to motivate the use of shrinkage estimators for modifying the B-H procedure. Several generalizations in the paper are discussed, and the methodology is applied to real and simulated datasets.


Shrinkage In Adaptive Procedures For False Discovery Rate Estimation In Multiple Testing: Structure And Synthesis, Debashis Ghosh Jan 2012

Shrinkage In Adaptive Procedures For False Discovery Rate Estimation In Multiple Testing: Structure And Synthesis, Debashis Ghosh

Debashis Ghosh

There has been much interest in the study of adaptive estimation procedures for controlling the false discovery rate (FDR). In this article, we take the direct approach to estimation of FDR of Storey (2002) and show how it can reexpressed as a particular type of shrinkage estimator. This representation leads to natural conditions on finite-sample FDR control for a general class of shrinkage estimators. In addition, many previous proposals from the literature can be unified under this framework for which finite-sample FDR results can be developed. Some asymptotic results are also provided.


Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh Jan 2011

Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. We show how the B-H procedure can be interpreted as a test based on the spacings corresponding to the p-value distributions. Using this equivalence, we develop a class of generalized B-H procedures that maintain control of the false discovery rate in finite-samples. We also consider the effect of correlation on the procedure; simulation studies are used to illustrate the methodology.


Software For Assumption Weighting For Meta-Analysis Of Genomic Data, Debashis Ghosh, Yihan Li Jan 2011

Software For Assumption Weighting For Meta-Analysis Of Genomic Data, Debashis Ghosh, Yihan Li

Debashis Ghosh

This is the software that accompanies Li and Ghosh, "Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data."


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2010

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression mi- croarrays, dierential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of dierential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing …


Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh Jan 2010

Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh

Debashis Ghosh

A recent nding in cancer research has been the characterization of previously undis- covered chromosomal abnormalities in several types of solid tumors. This was found based on analyses of high-throughput data from gene expression microarrays and motivated the development of so-called `outlier' tests for dierential expression. One statistical issue was the potential discreteness of the test statistics. Using ideas from fuzzy set theory, we develop fuzzy outlier detection algorithms that have links to ideas in multiple comparisons. Two- and K-sample extensions are considered. The methodology is illustrated by application to two microarray studies.


Uniqueprimer - A Web Utility For Design Of Specific Pcr Primers And Probes, Torstein Tengs Jan 2009

Uniqueprimer - A Web Utility For Design Of Specific Pcr Primers And Probes, Torstein Tengs

Dr. Torstein Tengs

We have developed a web-based tool for design of specific PCR primers and probes. The program allows you to enter primer sequence information as well as an optional probe, and sequence similarity searches (MegaBLAST) will be performed to see if the sequences match the same sequence entry in the specified database. If primers (and probe) match, this will be reported. The program can handle overlapping amplicons, amplification from a single primer, ambiguous bases and other problematic cases.


Hierarchical Hidden Markov Model With Application To Joint Analysis Of Chip-Chip And Chip-Seq Data, Hyungwon Choi, Debashis Ghosh, Zhaohui S. Qin Jan 2009

Hierarchical Hidden Markov Model With Application To Joint Analysis Of Chip-Chip And Chip-Seq Data, Hyungwon Choi, Debashis Ghosh, Zhaohui S. Qin

Debashis Ghosh

Motivation: Identication of transcription factor binding sites (TFBS) is a fundamental problem in understanding the mechanism of gene regulation. The ChIP-chip technology has accelerated this eort by providing a simultaneous genome-wide map of TFBS in a high-throughput fashion. Recently, a sequencing-based ChIP-seq has appeared as a promising alternative that can identify targets with an improved sensitivity/specicity in high resolution. However, studies have suggested that distinct experimental platforms can be complementary in TFBS identication. The availability of data obtained from multiple platforms motivates a meta-analysis for improved identication of candidate motifs.

Results: In this work, we propose a hierarchical hidden Markov …


A Double-Layered Mixture Model For The Joint Analysis Of Dna Copy Number And Gene Expression Data, Debashis Ghosh Jan 2009

A Double-Layered Mixture Model For The Joint Analysis Of Dna Copy Number And Gene Expression Data, Debashis Ghosh

Debashis Ghosh

Copy number aberration is a common form of genomic instability in cancer. Gene expression is closely tied to cytogenetic events by the central dogma of molecular biology, and serves as a mediator of copy number changes in disease phenotypes. Accordingly, it is of interest to develop proper statistical methods for jointly analyzing copy number and gene expression data. This work describes a novel Bayesian inferential approach for a double-layered mixture model (DLMM) which directly models the stochastic nature of copy number data and identifies abnormally expressed genes due to aberrant copy number. Simulation studies were conducted to illustrate the robustness …


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2009

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression microarrays, differential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of differential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing procedures …


Finding Recurrent Regions Of Copy Number Variation: A Review, Oscar M. Rueda, Ramon Diaz-Uriarte Nov 2008

Finding Recurrent Regions Of Copy Number Variation: A Review, Oscar M. Rueda, Ramon Diaz-Uriarte

Ramon Diaz-Uriarte

Copy number alterations (CNA) in genomic DNA are linked to a variety of human diseases. Although many methods have been developed to analyze data from a single subject, disease-critical genes are more likely to be found in regions that are common or recurrent among diseased subjects. Unfortunately, finding recurrent CNA regions remains a challenge. We review existing methods for the identification of recurrent CNA regions. Methods differ in their working definition of ``recurrent region'', the type of input data, the statistical and computational methods used to identify recurrence, and the biological considerations they incorporate (which play a role in the …


Multiple Testing Procedures Under Confounding, Debashis Ghosh Jan 2008

Multiple Testing Procedures Under Confounding, Debashis Ghosh

Debashis Ghosh

While multiple testing procedures have been the focus of much statistical research, an important facet of the problem is how to deal with possible confounding. Procedures have been developed by authors in genetics and statistics. In this chapter, we relate these proposals. We propose two new multiple testing approaches within this framework. The first combines sensitivity analysis methods with false discovery rate estimation procedures. The second involves construction of shrinkage estimators that utilize the mixture model for multiple testing. The procedures are illustrated with applications to a gene expression profiling experiment in prostate cancer.