Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Life Sciences (81)
- Genetics and Genomics (74)
- Statistical Methodology (59)
- Bioinformatics (58)
- Computational Biology (55)
-
- Biostatistics (47)
- Statistical Models (44)
- Multivariate Analysis (40)
- Statistical Theory (35)
- Genetics (31)
- Medicine and Health Sciences (28)
- Applied Statistics (20)
- Genomics (20)
- Longitudinal Data Analysis and Time Series (15)
- Survival Analysis (13)
- Biometry (11)
- Medical Sciences (10)
- Medical Biomathematics and Biometrics (9)
- Applied Mathematics (7)
- Numerical Analysis and Computation (7)
- Mathematics (6)
- Public Health (6)
- Cell and Developmental Biology (5)
- Clinical Trials (5)
- Computer Sciences (5)
- Design of Experiments and Sample Surveys (5)
- Diseases (5)
- Institution
-
- COBRA (78)
- Selected Works (14)
- SelectedWorks (10)
- University of Massachusetts Amherst (8)
- University of Kentucky (5)
-
- Claremont Colleges (4)
- Virginia Commonwealth University (4)
- The Texas Medical Center Library (3)
- University of Louisville (3)
- Western University (3)
- Southern Methodist University (2)
- University of Arkansas, Fayetteville (2)
- Yale University (2)
- East Tennessee State University (1)
- Illinois State University (1)
- The University of Akron (1)
- University of New Mexico (1)
- Utah State University (1)
- Keyword
-
- Microarray (16)
- Gene expression (13)
- Genetics (11)
- Genomics (10)
- Statistical Models (6)
-
- Classification (5)
- Differential expression (5)
- Microarrays (5)
- Proteomics (5)
- False discovery rate (4)
- Bioinformatics (3)
- Clustering (3)
- Computational Biology/Bioinformatics (3)
- Cross-validation (3)
- Functional Data Analysis (3)
- Mixture models (3)
- Model selection (3)
- Multiple comparisons (3)
- Multiple testing (3)
- Prediction (3)
- Statistical Theory and Methods (3)
- Survival analysis (3)
- Biomarkers (2)
- Block design (2)
- Bootstrap (2)
- Cancer (2)
- Cluster analysis (2)
- Comparative genomic hybridization (2)
- Density estimation (2)
- Empirical Bayes (2)
- Publication Year
- Publication
-
- COBRA Preprint Series (15)
- UW Biostatistics Working Paper Series (15)
- Harvard University Biostatistics Working Paper Series (13)
- U.C. Berkeley Division of Biostatistics Working Paper Series (12)
- Jeffrey S. Morris (10)
-
- Johns Hopkins University, Dept. of Biostatistics Working Papers (10)
- Dan Nettleton (8)
- Erin M. Conlon (7)
- The University of Michigan Department of Biostatistics Working Paper Series (6)
- Bioconductor Project Working Papers (4)
- Pomona Faculty Publications and Research (4)
- Theses and Dissertations (4)
- Theses and Dissertations--Statistics (4)
- Dissertations & Theses (Open Access) (3)
- Electronic Theses and Dissertations (3)
- Electronic Thesis and Dissertation Repository (3)
- Mark R Segal (3)
- Graduate Theses and Dissertations (2)
- Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series (2)
- Yale Day of Data (2)
- Annual Symposium on Biomathematics and Ecology Education and Research (1)
- Appalachian Student Research Forum (1)
- Biostatistics Faculty Publications (1)
- Mathematics & Statistics ETDs (1)
- Mathematics and Statistics Faculty Publications (1)
- Microbiology Department Faculty Publication Series (1)
- SMU Data Science Review (1)
- Shuangge Ma (1)
- Statistical Science Theses and Dissertations (1)
- Sunduz Keles (1)
- Publication Type
- File Type
Articles 121 - 143 of 143
Full-Text Articles in Microarrays
Semiparametric Methods For Identification Of Tumor Progression Genes From Microarray Data, Debashis Ghosh, Arul Chinnaiyan
Semiparametric Methods For Identification Of Tumor Progression Genes From Microarray Data, Debashis Ghosh, Arul Chinnaiyan
The University of Michigan Department of Biostatistics Working Paper Series
The use of microarray data has become quite commonplace in medical and scientific experiments. We focus here on microarray data generated from cancer studies. It is potentially important for the discovery of biomarkers to identify genes whose expression levels correlate with tumor progression. In this article, we develop statistical procedures for the identification of such genes, which we term tumor progression genes. Two methods are considered in this paper. The first is use of a proportional odds procedure, combined with false discovery rate estimation techniques to adjust for the multiple testing problem. The second method is based on order-restricted estimation …
A Model Based Background Adjustment For Oligonucleotide Expression Arrays, Zhijin Wu, Rafael A. Irizarry, Robert Gentleman, Francisco Martinez Murillo, Forrest Spencer
A Model Based Background Adjustment For Oligonucleotide Expression Arrays, Zhijin Wu, Rafael A. Irizarry, Robert Gentleman, Francisco Martinez Murillo, Forrest Spencer
Johns Hopkins University, Dept. of Biostatistics Working Papers
High density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further pre-processing and data reduction occurs following the image processing step. Statistical procedures developed by academic groups have been successful at improving the default algorithms provided by the Affymetrix system. In this paper we present a solution to one of the pre-processing steps, background adjustment, based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications.
Affymetrix GeneChip arrays use short oligonucleotides to …
Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman
Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman
Bioconductor Project Working Papers
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in …
Covariate Adjustment In The Analysis Of Microarray Data From Clinical Studies, Debashis Ghosh, Arul Chinnaiyan
Covariate Adjustment In The Analysis Of Microarray Data From Clinical Studies, Debashis Ghosh, Arul Chinnaiyan
The University of Michigan Department of Biostatistics Working Paper Series
There is tremendous scientific interest in the analysis of gene expression data in clinical settings, such as oncology. In this paper, we describe the importance of adjusting for confounders and other prognostic factors in order to select for differentially expressed genes for followup validation studies. We develop two approaches to the analysis of microarray data in nonrandomized clinical settings. The first is an extension of the current significance analysis of microarray procedures, where other covariates are taken into account. The second is a novel covariate-adjusted regression modelling based on the receiver operating characteristic curve for the analysis of gene expression …
Regulatory Motif Finding By Logic Regression, Sunduz Keles, Mark J. Van Der Laan, Chris Vulpe
Regulatory Motif Finding By Logic Regression, Sunduz Keles, Mark J. Van Der Laan, Chris Vulpe
U.C. Berkeley Division of Biostatistics Working Paper Series
Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding transcription factor binding sites and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression of Ruczinski et al. (2003). LogicMotif has two steps: First potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be …
A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan
A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Transcriptional regulation is one of the most important means of gene regulation. Uncovering transcriptional regulatory network helps us to understand the complex cellular process. In this paper, we describe a comprehensive statistical approach for constructing the transcriptional regulatory network using data of gene expression, promoter sequence, and transcription factor binding sites. Our simulation studies show that the overall and false positive error rates in the estimated transcriptional regulatory network are expected to be small if the systematic noise in the constructed feature matrix is small. Our analysis based on 658 microarray experiments on yeast gene expression programs and 46 transcription …
Error Models For Microarray Intensities, Wolfgang Huber, Anja Von Heydebreck, Martin Vingron
Error Models For Microarray Intensities, Wolfgang Huber, Anja Von Heydebreck, Martin Vingron
Bioconductor Project Working Papers
We derive the additive-multiplicative error model for microarray intensities, and describe two applications. For the detection of differentially expressed genes, we obtain a statistic whose variance is approximately independent of the mean intensity. For the post hoc calibration (normalization) of data with respect to experimental factors, we describe a method for parameter estimation.
Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh
Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh
The University of Michigan Department of Biostatistics Working Paper Series
The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous. In this article, we outline a general framework for determining …
Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau
Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau
Johns Hopkins University, Dept. of Biostatistics Working Papers
We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about differential gene expression. However, the approach is valid in any application that involves multiple comparisons in a large number of hypothesis tests. We discuss two decision problems in the context of this setup: the sample size selection and the decision about the multiple comparisons. We adopt a decision theoretic approach,using loss functions that combine the competing goals of discovering as many ifferentially expressed genes as possible, while keeping …
Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe
Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe
UW Biostatistics Working Paper Series
In a gene expression array study, the expression levels of thousands of genes are monitored simultaneously across various biological conditions on a small set of subjects. One goal of such studies is to explore a large pool of genes in order to select a subset of genes that appear to be differently expressed for further investigation. Of particular interest here is how to select the top k genes once genes are ranked based on their evidence for differential expression in two tissue types. We consider statistical methods that provide a more rigorous and intuitively appealing selection process for k. We …
Evaluation Of Multiple Models To Distinguish Closely Related Forms Of Disease Using Dna Microarray Data: An Application To Multiple Myeloma, Johanna S. Hardin, Michael Waddell, C. David Page, Fenghuang Zhan, Bart Barlogie, John Shaughnessy, John J. Crowley
Evaluation Of Multiple Models To Distinguish Closely Related Forms Of Disease Using Dna Microarray Data: An Application To Multiple Myeloma, Johanna S. Hardin, Michael Waddell, C. David Page, Fenghuang Zhan, Bart Barlogie, John Shaughnessy, John J. Crowley
Pomona Faculty Publications and Research
Motivation: Standard laboratory classification of the plasma cell dyscrasia monoclonal gammopathy of undetermined significance (MGUS) and the overt plasma cell neoplasm multiple myeloma (MM) is quite accurate, yet, for the most part, biologically uninformative. Most, if not all, cancers are caused by inherited or acquired genetic mutations that manifest themselves in altered gene expression patterns in the clonally related cancer cells. Microarray technology allows for qualitative and quantitative measurements of the expression levels of thousands of genes simultaneously, and it has now been used both to classify cancers that are morphologically indistinguishable and to predict response to therapy. It is …
Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng
Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng
U.C. Berkeley Division of Biostatistics Working Paper Series
Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable …
Stochastic Models Based On Molecular Hybridization Theory For Short Oligonucleotide Microarrays, Zhijin Wu, Richard Leblanc, Rafael A. Irizarry
Stochastic Models Based On Molecular Hybridization Theory For Short Oligonucleotide Microarrays, Zhijin Wu, Richard Leblanc, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. Affymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, non-specific hybridization, probe-specific effects, and measurement error, ad-hoc measures of expression, that summarize probe intensities, can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad-hoc procedure offered by Affymetrix. Recently, physical models based on molecular hybridization theory, have been …
Design Considerations For Efficient And Effective Microarray Studies, M. Kathleen Kerr
Design Considerations For Efficient And Effective Microarray Studies, M. Kathleen Kerr
UW Biostatistics Working Paper Series
This paper describes the theoretical and practical issues in experimental design for gene expression microarrays. Specifically, this paper (1) discusses the basic principles of design (randomization, replication, and blocking) as they pertain to microarrays, and (2) provides some general guidelines for statisticians designing microarray studies.
Cluster Stability Scores For Microarray Data In Cancer Studies, Mark Smolkin, Debashis Ghosh
Cluster Stability Scores For Microarray Data In Cancer Studies, Mark Smolkin, Debashis Ghosh
The University of Michigan Department of Biostatistics Working Paper Series
A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from these procedures. While much work has been done on assessing the global question of number of clusters in a dataset, relatively little research exists on assessing stability of individual clusters. A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will …
Linear Models For Microarray Data Analysis: Hidden Similarities And Differences, M. Kathleen Kerr
Linear Models For Microarray Data Analysis: Hidden Similarities And Differences, M. Kathleen Kerr
UW Biostatistics Working Paper Series
In the past several years many linear models have been proposed for analyzing two-color microarray data. As presented in the literature, many of these models appear dramatically different. However, many of these models are reformulations of the same basic approach to analyzing microarray data. This paper demonstrates the equivalence of some of these models. Attention is directed at choices in microarray data analysis that have a larger impact on the results than the choice of linear model.
Selecting Differentially Expressed Genes From Microarray Experiments, Margaret S. Pepe, Gary M. Longton, Garnet L. Anderson, Michel Schummer
Selecting Differentially Expressed Genes From Microarray Experiments, Margaret S. Pepe, Gary M. Longton, Garnet L. Anderson, Michel Schummer
UW Biostatistics Working Paper Series
High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that distinguish different tissue types. Of particular interest here is cancer versus normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified and suggest using the “selection probability function”, the probability distribution of rankings …
Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick
Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick
U.C. Berkeley Division of Biostatistics Working Paper Series
DNA microarrays are a new and promising biotechnology which allows the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in microarray experiments is the identification of differentially expressed genes, i.e., genes whose expression levels are associated with a response or covariate of interest. The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses or covariates. As a typical microarray experiment measures expression levels for thousands of …
Comparative Genomic Hybridization Array Analysis, Annette M. Molinaro, Mark J. Van Der Laan, Dan H. Moore
Comparative Genomic Hybridization Array Analysis, Annette M. Molinaro, Mark J. Van Der Laan, Dan H. Moore
U.C. Berkeley Division of Biostatistics Working Paper Series
At the present time, there is increasing evidence that cancer may be regulated by the number of copies of genes in tumor cells. Through microarray technology it is now possible to measure the number of copies of thousands of genes and gene segments in samples of chromosomal DNA. Microarray comparative genomic hybridization (array CGH) provides the opportunity to both measure DNA sequence copy number gains and losses and map these aberrations to the genomic sequence. Gains can signify the over-expression of oncogenes, genes which stimulate cell growth and have become hyperactive, while losses can signify under-expression of tumor suppressor genes, …
A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g.: the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. In this paper, we define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster …
Statistical Issues In The Clustering Of Gene Expression Data, Darlene R. Goldstein, Debashis Ghosh, Erin M. Conlon
Statistical Issues In The Clustering Of Gene Expression Data, Darlene R. Goldstein, Debashis Ghosh, Erin M. Conlon
Erin M. Conlon
This paper illustrates some of the problems which can occur in any data set when clustering samples of gene expression profiles. These include a possible high degree of dependence of results on choice of clustering algorithm, further dependence of results on the choices of genes and samples to be included in the clustering (for example, whether or not to include control samples), and difficulty in assessing the validity of the grouping. We also demonstrate the use of Cox regression as a tool to identify genes influencing survival.
Identification Of Regulatory Elements Using A Feature Selection Method, Sunduz Keles, Mark J. Van Der Laan, Michael B. Eisen
Identification Of Regulatory Elements Using A Feature Selection Method, Sunduz Keles, Mark J. Van Der Laan, Michael B. Eisen
U.C. Berkeley Division of Biostatistics Working Paper Series
Many methods have been described to identify regulatory motifs in the transcription control regions of genes that exhibit similar patterns of gene expression across a variety of experimental conditions. Here we focus on a single experimental condition, and utilize gene expression data to identify sequence motifs associated with genes that are activated under this experimental condition. We use a linear model with two way interactions to model gene expression as a function of sequence features (words) present in presumptive transcription control regions. The most relevant features are selected by a feature selection method called stepwise selection with monte carlo cross …
Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …