Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Physical Sciences and Mathematics

Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery Dec 2018

Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery

Mathematics and Statistics Faculty Publications

Background: When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study.

Results: We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of …


Confident Difference Criterion: A New Bayesian Differentially Expressed Gene Selection Algorithm With Applications., Fang Yu, Ming-Hui Chen, Lynn Kuo, Heather Talbott, John S. Davis Aug 2015

Confident Difference Criterion: A New Bayesian Differentially Expressed Gene Selection Algorithm With Applications., Fang Yu, Ming-Hui Chen, Lynn Kuo, Heather Talbott, John S. Davis

Journal Articles: Biostatistics

BACKGROUND: Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators.

RESULTS: In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene …


Evaluation Of Some Statistical Methods For The Identification Of Differentially Expressed Genes, Andrew L. Haddon Mar 2015

Evaluation Of Some Statistical Methods For The Identification Of Differentially Expressed Genes, Andrew L. Haddon

FIU Electronic Theses and Dissertations

Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is …


Survival Analysis With High-Dimensional Covariates: An Application In Microarray Studies, David Engler, Yi Li Feb 2009

Survival Analysis With High-Dimensional Covariates: An Application In Microarray Studies, David Engler, Yi Li

Faculty Publications

Use of microarray technology often leads to high-dimensional and low-sample size (HDLSS) data settings. A variety of approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptations of the elastic net approach are presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time (AFT) model. Assessment of the two …


Detecting Differentially Expressed Genes While Controlling The False Discovery Rate For Microarray Data, Shuo Jiao Jan 2009

Detecting Differentially Expressed Genes While Controlling The False Discovery Rate For Microarray Data, Shuo Jiao

Department of Statistics: Dissertations, Theses, and Student Work

Microarray is an important technology which enables people to investigate the expression levels of thousands of genes at the same time. One common goal of microarray data analysis is to detect differentially expressed genes while controlling the false discovery rate. This dissertation consists with four papers written to address this goal. The dissertation is organized as follows: In Chapter 1, a brief introduction of the Affymetrix GeneChip microarray technology is provided. The concept of differentially expressed genes and the definition of the false discovery rate are also introduced. In Chapter 2, a literature review of the related works on this …


Focus On Rna Isolation: Obtaining Rna For Microrna (Mirna) Expression Profiling Analyses Of Neural Tissue, Wang-Xia Wang, Bernard R. Wilfred, Donald A. Baldwin, R. Benjamin Isett, Na Ren, Arnold J. Stromberg, Peter T. Nelson Nov 2008

Focus On Rna Isolation: Obtaining Rna For Microrna (Mirna) Expression Profiling Analyses Of Neural Tissue, Wang-Xia Wang, Bernard R. Wilfred, Donald A. Baldwin, R. Benjamin Isett, Na Ren, Arnold J. Stromberg, Peter T. Nelson

Sanders-Brown Center on Aging Faculty Publications

MicroRNAs (miRNAs) are present in all known plant and animal tissues and appear to be somewhat concentrated in the mammalian nervous system. Many different miRNA expression profiling platforms have been described. However, relatively little research has been published to establish the importance of 'upstream' variables in RNA isolation for neural miRNA expression profiling. We tested whether apparent changes in miRNA expression profiles may be associated with tissue processing, RNA isolation techniques, or different cell types in the sample. RNA isolation was performed on a single brain sample using eight different RNA isolation methods, and results were correlated using a conventional …


The Expression Of Microrna Mir-107 Decreases Early In Alzheimer's Disease And May Accelerate Disease Progression Through Regulation Of Β-Site Amyloid Precursor Protein-Cleaving Enzyme 1, Wang-Xia Wang, Bernard W. Rajeev, Arnold J. Stromberg, Na Ren, Guiliang Tang, Qingwei Huang, Isidore Rigoutsos, Peter T. Nelson Jan 2008

The Expression Of Microrna Mir-107 Decreases Early In Alzheimer's Disease And May Accelerate Disease Progression Through Regulation Of Β-Site Amyloid Precursor Protein-Cleaving Enzyme 1, Wang-Xia Wang, Bernard W. Rajeev, Arnold J. Stromberg, Na Ren, Guiliang Tang, Qingwei Huang, Isidore Rigoutsos, Peter T. Nelson

Sanders-Brown Center on Aging Faculty Publications

MicroRNAs (miRNAs) are small regulatory RNAs that participate in posttranscriptional gene regulation in a sequence-specific manner. However, little is understood about the role(s) of miRNAs in Alzheimer's disease (AD). We used miRNA expression microarrays on RNA extracted from human brain tissue from the University of Kentucky Alzheimer's Disease Center Brain Bank with near-optimal clinicopathological correlation. Cases were separated into four groups: elderly nondemented with negligible AD-type pathology, nondemented with incipient AD pathology, mild cognitive impairment (MCI) with moderate AD pathology, and AD. Among the AD-related miRNA expression changes, miR-107 was exceptional because miR-107 levels decreased significantly even in patients with …


Molecular Targets Of 2,3,7,8-Tetrachlorodibenzo-P-Dioxin (Tcdd) Within The Zebrafish Ovary: Insights Into Tcdd-Induced Endocrine Disruption And Reproductive Toxicity, Tisha C. King Heiden, Craig Struble, Matthew L. Rise, Martin J. Hessner, Reinhold J. Hutz, Michael J. Carvan Iii Jan 2008

Molecular Targets Of 2,3,7,8-Tetrachlorodibenzo-P-Dioxin (Tcdd) Within The Zebrafish Ovary: Insights Into Tcdd-Induced Endocrine Disruption And Reproductive Toxicity, Tisha C. King Heiden, Craig Struble, Matthew L. Rise, Martin J. Hessner, Reinhold J. Hutz, Michael J. Carvan Iii

Mathematics, Statistics and Computer Science Faculty Research and Publications

TCDD is a reproductive toxicant and endocrine disruptor, yet the mechanisms by which it causes these reproductive alterations are not fully understood. In order to provide additional insight into the molecular mechanisms that underlie TCDD's reproductive toxicity, we assessed TCDD-induced transcriptional changes in the ovary as they relate to previously described impacts on serum estradiol concentrations and altered follicular development in zebrafish. In silico computational approaches were used to correlate candidate regulatory motifs with observed changes in gene expression. Our data suggest that TCDD inhibits follicle maturation via attenuated gonadotropin responsiveness and/or depressed estradiol biosynthesis, and that interference of estrogen-regulated …


2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr Mar 2006

2^K Factorials In Blocks Of Size 2, With Application To Two-Color Microarray Experiments, Kathleen F. Kerr

UW Biostatistics Working Paper Series

When a two-level design must be run in blocks of size two, there is a unique blocking scheme that enables estimation of all the main effects. Unfortunately this design does not enable estimation of any two-factor interactions. When the experimental goal is to estimate all main effects and two-factor interactions, it is necessary to combine replicates of the experiment that use different blocking schemes. In this paper we identify such designs for up to eight factors that enable estimation of all main effects and two-factor interactions with the fewest number of replications. In addition, we give a construction for general …


Yeast Through The Ages: A Statistical Analysis Of Genetic Changes In Aging Yeast, Alison Wise '05, Johanna S. Hardin, Laura Hoopes Jan 2006

Yeast Through The Ages: A Statistical Analysis Of Genetic Changes In Aging Yeast, Alison Wise '05, Johanna S. Hardin, Laura Hoopes

Pomona Faculty Publications and Research

Microarray technology allows for the expression levels of thousands of genes in a cell to be measured simultaneously. The technology provides great potential in the fields of biology and medicine, as the analysis of data obtained from microarray experiments gives insight into the roles of specific genes and the associated changes across experimental conditions (e.g., aging, mutation, radiation therapy, drug dosage). The application of statistical tools to microarray data can help make sense of the experiment and thereby advance genetic, biological, and medical research. Likewise, microarrays provide an exciting means through which to explore statistical techniques.


Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey Nov 2005

Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for …


A Platform-Independent Software Suite For Statistical Analysis Of High Dimensional Biology Data, David B. Allison, Jacob P. L. Brand, Jode W. Edwards, Gary L. Gadbury, Kyoungmi Kim, Tapan Mehta, Grier P. Page, Amit Patki, Vinodh Srinivasasainagendra, Prinal Trivedi, Jelai Wang, Stanislav O. Zakharkin Jan 2005

A Platform-Independent Software Suite For Statistical Analysis Of High Dimensional Biology Data, David B. Allison, Jacob P. L. Brand, Jode W. Edwards, Gary L. Gadbury, Kyoungmi Kim, Tapan Mehta, Grier P. Page, Amit Patki, Vinodh Srinivasasainagendra, Prinal Trivedi, Jelai Wang, Stanislav O. Zakharkin

Mathematics and Statistics Faculty Research & Creative Works

Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing.


Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan Dec 2004

Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of false positives and rejected hypotheses. These error rates include tail probabilities …


Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh Jun 2004

Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

Due to the advent of high-throughput genomic technology, it has become possible to globally monitor cellular activities on a genomewide basis. With these new methods, scientists can begin to address important biological questions. One such question involves the identification of replication origins, which are regions in chromosomes where DNA replication is initiated. In addition, one hypothesis regarding replication origins is that their locations are non-random throughout the genome. In this article, we develop methods for identification of and cluster inference regarding replication origins involving genomewide expression data. We compare several nonparametric regression methods for the identification of replication origin locations. …


A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan Mar 2004

A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Transcriptional regulation is one of the most important means of gene regulation. Uncovering transcriptional regulatory network helps us to understand the complex cellular process. In this paper, we describe a comprehensive statistical approach for constructing the transcriptional regulatory network using data of gene expression, promoter sequence, and transcription factor binding sites. Our simulation studies show that the overall and false positive error rates in the estimated transcriptional regulatory network are expected to be small if the systematic noise in the constructed feature matrix is small. Our analysis based on 658 microarray experiments on yeast gene expression programs and 46 transcription …


Evaluation Of Multiple Models To Distinguish Closely Related Forms Of Disease Using Dna Microarray Data: An Application To Multiple Myeloma, Johanna S. Hardin, Michael Waddell, C. David Page, Fenghuang Zhan, Bart Barlogie, John Shaughnessy, John J. Crowley Jan 2004

Evaluation Of Multiple Models To Distinguish Closely Related Forms Of Disease Using Dna Microarray Data: An Application To Multiple Myeloma, Johanna S. Hardin, Michael Waddell, C. David Page, Fenghuang Zhan, Bart Barlogie, John Shaughnessy, John J. Crowley

Pomona Faculty Publications and Research

Motivation: Standard laboratory classification of the plasma cell dyscrasia monoclonal gammopathy of undetermined significance (MGUS) and the overt plasma cell neoplasm multiple myeloma (MM) is quite accurate, yet, for the most part, biologically uninformative. Most, if not all, cancers are caused by inherited or acquired genetic mutations that manifest themselves in altered gene expression patterns in the clonally related cancer cells. Microarray technology allows for qualitative and quantitative measurements of the expression levels of thousands of genes simultaneously, and it has now been used both to classify cancers that are morphologically indistinguishable and to predict response to therapy. It is …


Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng Dec 2003

Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng

U.C. Berkeley Division of Biostatistics Working Paper Series

Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable …


Design Considerations For Efficient And Effective Microarray Studies, M. Kathleen Kerr Jun 2003

Design Considerations For Efficient And Effective Microarray Studies, M. Kathleen Kerr

UW Biostatistics Working Paper Series

This paper describes the theoretical and practical issues in experimental design for gene expression microarrays. Specifically, this paper (1) discusses the basic principles of design (randomization, replication, and blocking) as they pertain to microarrays, and (2) provides some general guidelines for statisticians designing microarray studies.


Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick Aug 2002

Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick

U.C. Berkeley Division of Biostatistics Working Paper Series

DNA microarrays are a new and promising biotechnology which allows the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in microarray experiments is the identification of differentially expressed genes, i.e., genes whose expression levels are associated with a response or covariate of interest. The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses or covariates. As a typical microarray experiment measures expression levels for thousands of …