Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Classification (2)
- Differential expression (2)
- Microarray (2)
- Adjust p value (1)
- Aging (1)
-
- Block design (1)
- Blocked factorial (1)
- Bootstrap (1)
- Case-control study (1)
- Cluster analysis (1)
- Computing (1)
- Cross validation (1)
- Crossover (1)
- Data Analysis (1)
- Differential expression; false discovery rate; q-values; shrinkage; significance analysis of microarrays (1)
- Discrimination (1)
- Dye-bias (1)
- Dye-swap (1)
- Emacs Speaks Statistics (1)
- Endotoxin (1)
- Exploratory analysis (1)
- Expression arrays (1)
- Factorial Design (1)
- False Discovery Rate; Genetics; High Dimensional Data; Human Immunode Effciency Virus; Kullback-Leibler; Mahalanobis; Multinomial; Sequence Analysis (1)
- False discovery rate; multiple hypothesis testing; q-values; shrinkage (1)
- Feature-selection (1)
- Fractional factorial (1)
- GEE (1)
- Gene clustering (1)
- Gene expression analysis (1)
Articles 1 - 18 of 18
Full-Text Articles in Physical Sciences and Mathematics
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …
Meta-Analysis Of Genome-Wide Association Studies With Correlated Individuals: Application To The Hispanic Community Health Study/Study Of Latinos (Hchs/Sol), Tamar Sofer, John R. Shaffer, Misa Graff, Qibin Qi, Adrienne M. Stilp, Stephanie M. Gogarten, Kari E. North, Carmen R. Isasi, Cathy C. Laurie, Adam A. Szpiro
Meta-Analysis Of Genome-Wide Association Studies With Correlated Individuals: Application To The Hispanic Community Health Study/Study Of Latinos (Hchs/Sol), Tamar Sofer, John R. Shaffer, Misa Graff, Qibin Qi, Adrienne M. Stilp, Stephanie M. Gogarten, Kari E. North, Carmen R. Isasi, Cathy C. Laurie, Adam A. Szpiro
UW Biostatistics Working Paper Series
Investigators often meta-analyze multiple genome-wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta-analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta-analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed-effect model for meta-analysis, “MetaCor", which accounts for correlation between stratum-specific effect estimates. Simulations show that MetaCor controls …
Testing Gene-Environment Interactions In The Presence Of Measurement Error, Chongzhi Di, Li Hsu, Charles Kooperberg, Alex Reiner, Ross Prentice
Testing Gene-Environment Interactions In The Presence Of Measurement Error, Chongzhi Di, Li Hsu, Charles Kooperberg, Alex Reiner, Ross Prentice
UW Biostatistics Working Paper Series
Complex diseases result from an interplay between genetic and environmental risk factors, and it is of great interest to study the gene-environment interaction (GxE) to understand the etiology of complex diseases. Recent developments in genetics field allows one to study GxE systematically. However, one difficulty with GxE arises from the fact that environmental exposures are often measured with error. In this paper, we focus on testing GxE when the environmental exposure E is subject to measurement error. Surprisingly, contrast to the well-established results that the naive test ignoring measurement error is valid in testing the main effects, we find that …
What Is The Best Reference Rna? And Other Questions Regarding The Design And Analysis Of Two-Color Microarray Experiments, Kathleen F. Kerr, Kyle A. Serikawa, Caimiao Wei, Mette A. Peters, Roger E. Bumgarner
What Is The Best Reference Rna? And Other Questions Regarding The Design And Analysis Of Two-Color Microarray Experiments, Kathleen F. Kerr, Kyle A. Serikawa, Caimiao Wei, Mette A. Peters, Roger E. Bumgarner
UW Biostatistics Working Paper Series
The reference design is a practical and popular choice for microarray studies using two-color platforms. In the reference design, the reference RNA uses half of all array resources, leading investigators to ask: What is the best reference RNA? We propose a novel method for evaluating reference RNAs and present the results of an experiment that was specially designed to evaluate three common choices of reference RNA. We found no compelling evidence in favor of any particular reference. In particular, a commercial reference showed no advantage in our data. Our experimental design also enabled a new way to test the effectiveness …
Power Boosting In Genome-Wide Studies Via Methods For Multivariate Outcomes, Mary J. Emond
Power Boosting In Genome-Wide Studies Via Methods For Multivariate Outcomes, Mary J. Emond
UW Biostatistics Working Paper Series
Whole-genome studies are becoming a mainstay of biomedical research. Examples include expression array experiments, comparative genomic hybridization analyses and large case-control studies for detecting polymorphism/disease associations. The tactic of applying a regression model to every locus to obtain test statistics is useful in such studies. However, this approach ignores potential correlation structure in the data that could be used to gain power, particularly when a Bonferroni correction is applied to adjust for multiple testing. In this article, we propose using regression techniques for misspecified multivariate outcomes to increase statistical power over independence-based modeling at each locus. Even when the outcome …
Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes
Genome Scanning Methods For Comparing Sequences Between Groups, With Application To Hiv Vaccine Trials, Peter B. Gilbert, Chunyuan Wu, David V. Jobes
UW Biostatistics Working Paper Series
Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify HIV positions at which the amino acids in sequences from infected vaccine recipients tend to be more divergent from the corresponding reference amino acid than the amino acids in sequences from infected placebo recipients. We consider five two-sample test statistics, based on Euclidean, Mahalanobis, and Kullback-Leibler divergence measures. Weights are incorporated to reflect biological information contained in …
2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr
2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr
UW Biostatistics Working Paper Series
When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …
Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden
Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden
UW Biostatistics Working Paper Series
The study of the cell-cycle is important in order to aid in our understanding of the basic mechanisms of life, yet progress has been slow due to the complexity of the process and our lack of ability to study it at high resolution. Recent advances in microarray technology have enabled scientists to study the gene expression at the genome-scale with a manageable cost, and there has been an increasing effort to identify cell-cycle regulated genes. In this chapter, we discuss the analysis of cell-cycle gene expression data, focusing on a model-based Bayesian approaches. The majority of the models we describe …
Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey
Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey
UW Biostatistics Working Paper Series
Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for …
A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey
A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey
UW Biostatistics Working Paper Series
A two-channel microarray measures the relative expression levels of thousands of genes from a pair of biological samples. In order to reliably compare gene expression levels between and within arrays, it is necessary to remove systematic errors that distort the biological signal of interest. The standard for accomplishing this is smoothing "MA-plots" to remove intensity-dependent dye bias and array-specific effects. However, MA methods require strong assumptions. We review these assumptions and derive several practical scenarios in which they fail. The "dye-swap" normalization method has been much less frequently used because it requires two arrays per pair of samples. We show …
The Optimal Discovery Procedure: A New Approach To Simultaneous Significance Testing, John D. Storey
The Optimal Discovery Procedure: A New Approach To Simultaneous Significance Testing, John D. Storey
UW Biostatistics Working Paper Series
Significance testing is one of the main objectives of statistics. The Neyman-Pearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. As …
The Optimal Discovery Procedure For Large-Scale Significance Testing, With Applications To Comparative Microarray Experiments, John D. Storey, James Y. Dai, Jeffrey T. Leek
The Optimal Discovery Procedure For Large-Scale Significance Testing, With Applications To Comparative Microarray Experiments, John D. Storey, James Y. Dai, Jeffrey T. Leek
UW Biostatistics Working Paper Series
As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the Optimal Discovery Procedure (ODP), which has recently been introduced and theoretically shown …
The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self
The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self
UW Biostatistics Working Paper Series
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we propose a new model-based clustering method -- the clustering of regression models method, which groups genes that …
Significance Analysis Of Time Course Microarray Experiments, John D. Storey, Wenzhong Xiao, Jeffrey T. Leek, Ronald G. Tompkins, Ron W. Davis
Significance Analysis Of Time Course Microarray Experiments, John D. Storey, Wenzhong Xiao, Jeffrey T. Leek, Ronald G. Tompkins, Ron W. Davis
UW Biostatistics Working Paper Series
Characterizing the genome-wide dynamic regulation of gene expression is important and will be of much interest in the future. However, there is currently no established method for identifying differentially expressed genes in a time course study. Here we propose a significance method for analyzing time course microarray studies that can be applied to the typical types of comparisons and sampling schemes. This method is applied to two studies on humans. In one study, genes are identified that show differential expression over time in response to in vivo endotoxin administration. Using our method 7409 genes are called significant at a 1% …
Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe
Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe
UW Biostatistics Working Paper Series
In a gene expression array study, the expression levels of thousands of genes are monitored simultaneously across various biological conditions on a small set of subjects. One goal of such studies is to explore a large pool of genes in order to select a subset of genes that appear to be differently expressed for further investigation. Of particular interest here is how to select the top k genes once genes are ranked based on their evidence for differential expression in two tissue types. We consider statistical methods that provide a more rigorous and intuitively appealing selection process for k. We …
Simple Parallel Statistical Computing In R, Anthony Rossini, Luke Tierney, Na Li
Simple Parallel Statistical Computing In R, Anthony Rossini, Luke Tierney, Na Li
UW Biostatistics Working Paper Series
Theoretically, many modern statistical procedures are trivial to parallelize. However, practical deployment of a parallelized implementation which is robust and reliably runs on different computational cluster configurations and environments is far from trivial. We present a framework for the R statistical computing language that provides a simple yet powerful programming interface to a computational cluster. This interface allows the development of R functions that distribute independent computations across the nodes of the computational cluster. The resulting framework allows statisticians to obtain significant speed-ups for some computations at little additional development cost. The particular implementation can be deployed in heterogeneous computing …
Literate Statistical Practice, Anthony Rossini, Friedrich Leisch
Literate Statistical Practice, Anthony Rossini, Friedrich Leisch
UW Biostatistics Working Paper Series
Literate Statistical Practice (LSP, Rossini, 2001) describes an approach for creating self-documenting statistical results. It applies literate programming (Knuth, 1992) and related techniques in a natural fashion to the practice of statistics. In particular, documentation, specification, and descriptions of results are written concurrently with writing and evaluation of statistical programs. We discuss how and where LSP can be integrated into practice and illustrate this with an example derived from an actual statistical consulting project. The approach is simplified through the use of a comprehensive, open source toolset incorporating Noweb, Emacs Speaks Statistics (ESS), Sweave (Ramsey, 1994; Rossini, et al, 2002; …
Selecting Differentially Expressed Genes From Microarray Experiments, Margaret S. Pepe, Gary M. Longton, Garnet L. Anderson, Michel Schummer
Selecting Differentially Expressed Genes From Microarray Experiments, Margaret S. Pepe, Gary M. Longton, Garnet L. Anderson, Michel Schummer
UW Biostatistics Working Paper Series
High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that distinguish different tissue types. Of particular interest here is cancer versus normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified and suggest using the “selection probability function”, the probability distribution of rankings …