Open Access. Powered by Scholars. Published by Universities.®

Microarrays Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Microarrays

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Differential Patterns Of Interaction And Gaussian Graphical Models, Masanao Yajima, Donatello Telesca, Yuan Ji, Peter Muller Apr 2012

Differential Patterns Of Interaction And Gaussian Graphical Models, Masanao Yajima, Donatello Telesca, Yuan Ji, Peter Muller

COBRA Preprint Series

We propose a methodological framework to assess heterogeneous patterns of association amongst components of a random vector expressed as a Gaussian directed acyclic graph. The proposed framework is likely to be useful when primary interest focuses on potential contrasts characterizing the association structure between known subgroups of a given sample. We provide inferential frameworks as well as an efficient computational algorithm to fit such a model and illustrate its validity through a simulation. We apply the model to Reverse Phase Protein Array data on Acute Myeloid Leukemia patients to show the contrast of association structure between refractory patients and relapsed …


A Bayesian Model Averaging Approach For Observational Gene Expression Studies, Xi Kathy Zhou, Fei Liu, Andrew J. Dannenberg Jun 2011

A Bayesian Model Averaging Approach For Observational Gene Expression Studies, Xi Kathy Zhou, Fei Liu, Andrew J. Dannenberg

COBRA Preprint Series

Identifying differentially expressed (DE) genes associated with a sample characteristic is the primary objective of many microarray studies. As more and more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene finding. Typical methods for identifying DE genes require ranking all the genes according to a pre-selected statistic based on a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches unavoidably result in model misspecification, which can lead to increased error …


Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel Dec 2010

Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel

COBRA Preprint Series

In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …


The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel Jun 2010

The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel

COBRA Preprint Series

A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with …


Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel Aug 2009

Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel

COBRA Preprint Series

Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a …


Validation Of Differential Gene Expression Algorithms: Application Comparing Fold Change Estimation To Hypothesis Testing, David R. Bickel, Corey M. Yanofsky Feb 2009

Validation Of Differential Gene Expression Algorithms: Application Comparing Fold Change Estimation To Hypothesis Testing, David R. Bickel, Corey M. Yanofsky

COBRA Preprint Series

Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. The widespread confusion on which method to use in practice has been exacerbated by the finding that simply ranking genes by their fold changes sometimes outperforms popular statistical tests.

Algorithms may be compared by quantifying each method's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. For …


The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel Nov 2008

The Strength Of Statistical Evidence For Composite Hypotheses With An Application To Multiple Comparisons, David R. Bickel

COBRA Preprint Series

The strength of the statistical evidence in a sample of data that favors one composite hypothesis over another may be quantified by the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function. Unlike the p-value and the Bayes factor, this measure of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a non-trivial interval to the hypotheses that it lies within the interval, the proposed measure of evidence almost always asymptotically favors the correct hypothesis …


A Bayesian Hierarchical Model For Spot Fluorescence In Microarrays, Federico Mattia Stefanini Mar 2007

A Bayesian Hierarchical Model For Spot Fluorescence In Microarrays, Federico Mattia Stefanini

COBRA Preprint Series

Microarray experiments are characterized by the presence of many sources of experimental bias and a remarkably large technical variability. The assessment of differential expression for genes transcribed into a small number of mRNA copies heavily depends on the proper quantification of background fluorescence within spot. The rough model `observed = hybridization plus background' fluorescence is at first reformulated at spot level, then it is embedded into a Bayesian hierarchical model suited for fitting control spots. The novelties of the approach include the background correction performed on the latent mean of replicated spots, and an explicit model for outlying observations at …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


A Flexible Statistical Method For Detecting Genomic Copy-Number Changes Using Hidden Markov Models With Reversible Jump Mcmc , Oscar M. Rueda, Ramon Diaz-Uriarte Aug 2006

A Flexible Statistical Method For Detecting Genomic Copy-Number Changes Using Hidden Markov Models With Reversible Jump Mcmc , Oscar M. Rueda, Ramon Diaz-Uriarte

COBRA Preprint Series

We have developed a statistical method for the analysis of array based CGH data to detect genomic DNA copy number changes. Our method allows us to answer the biologically relevant questions (what is the probability that a given gene or region has increased or decreased copy number changes) in a clear and simple way, within a rigorous statistical framework. We use a non-homogeneous Hidden Markov Model that incorporates distance between genes, a crucial requirement to analyze data from platforms where distances between probes is highly variable. As the true number of hidden states (states of copy number changes) is not …


Survival Analysis Of Longitudinal Microarrays, Natasa Rajicic, Dianne M. Finkelstein, David A. Schoenfeld Jul 2006

Survival Analysis Of Longitudinal Microarrays, Natasa Rajicic, Dianne M. Finkelstein, David A. Schoenfeld

COBRA Preprint Series

Motivation: The development of methods for linking gene expressions to various clinical and phenotypic characteristics is an active area of genomic research. Scientists hope that such analysis may, for example, describe relationships between gene function and clinical events such as death or recovery. Methods are available for relating gene expression to measurements that are categorized or continuous, but there is less work in relating expressions to an observed event time such as time to death, response, or relapse. When gene expressions are measured over time, there are methods for differentiating temporal patterns. However, no methods have yet been proposed for …


New Spiked-In Probe Sets For The Affymetrix Hgu-133a Latin Square Experiment, Monnie Mcgee, Zhongxue Chen Jun 2006

New Spiked-In Probe Sets For The Affymetrix Hgu-133a Latin Square Experiment, Monnie Mcgee, Zhongxue Chen

COBRA Preprint Series

The Affymetrix HGU-133A spike in data set has been used for determining the sensitivity and specificity of various methods for the analysis of microarray data. We show that there are 22 additional probe sets that detect spike in RNAs that should be considered as spike in probe sets. We assign each proposed spiked-in probe set to a concentration group within the Latin Square design, and examine the effects of the additional spiked-in probe sets on assessing the accuracy of analysis methods currently in use. We show that several popular preprocessing methods are more sensitive and specific when the new spike-ins …


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …