Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Keyword
-
- Genetics (2)
- Conditional independence; Microarray data; Probability of expression; Probit models; Reciprocal graphs; Reversible jumps MCMC (1)
- Copy number; Batch effects; Robust; Multilevel model; High-throughput; Oligonucleotide array (1)
- Epigenetics; DNA methylation; Micorarray (1)
- High dimensional regression; Interval estimation; Orale property; Regularized estimation; Resampling methods (1)
Articles 1 - 11 of 11
Full-Text Articles in Computational Biology
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
COBRA Preprint Series
In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …
Using The R Package Crlmm For Genotyping And Copy Number Estimation, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, Ingo Ruczinski
Using The R Package Crlmm For Genotyping And Copy Number Estimation, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, Ingo Ruczinski
Johns Hopkins University, Dept. of Biostatistics Working Papers
Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for …
A Perturbation Method For Inference On Regularized Regression Estimates, Jessica Minnier, Lu Tian, Tianxi Cai
A Perturbation Method For Inference On Regularized Regression Estimates, Jessica Minnier, Lu Tian, Tianxi Cai
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Decision-Theory Approach To Interpretable Set Analysis For High-Dimensional Data, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani
A Decision-Theory Approach To Interpretable Set Analysis For High-Dimensional Data, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani
Johns Hopkins University, Dept. of Biostatistics Working Papers
A ubiquitous problem in igh-dimensional analysis is the identification of pre-defined sets that are enriched for features showing an association of interest. In this situation, inference is performed on sets, not individual features. We propose an approach which focuses on estimating the fraction of non-null features in a set. We search for unions of disjoint sets (atoms), using as the loss function a weighted average of the number of false and missed discoveries. We prove that the solution is equivalent to thresholding the atomic false discovery rate and that our approach results in a more interpretable set analysis.
The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel
The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel
COBRA Preprint Series
A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with …
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan
Permutation-Based Pathway Testing Using The Super Learner Algorithm, Paul Chaffee, Alan E. Hubbard, Mark L. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Many diseases and other important phenotypic outcomes are the result of a combination of factors. For example, expression levels of genes have been used as input to various statistical methods for predicting phenotypic outcomes. One particular popular variety is the so-called gene set enrichment analysis (GSEA). This paper discusses an augmentation to an existing strategy to estimate the significance of an associations between a disease outcome and a predetermined combination of biological factors, based on a specific data adaptive regression method (the "Super Learner," van der Laan et al., 2007). The procedure uses an aggressive search procedure, potentially resulting in …
Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry
Accurate Genome-Scale Percentage Dna Methylation Estimates From Microarray Data, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, Rafael A. Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray pre-processing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy …
Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, Ralph S. Freedman
Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, Ralph S. Freedman
Harvard University Biostatistics Working Paper Series
No abstract provided.
Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry
Wavelet Based Functional Models For Transcriptome Analysis With Tiling Arrays, Lieven Clement, Kristof Debeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, Rafael Irizarry
Johns Hopkins University, Dept. of Biostatistics Working Papers
For a better understanding of the biology of an organism a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays can be used for this purpose. Such arrays allow the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. Much of the initial methodological efforts were designed for transcript discovery, while more recent developments also focus on differential expression. To our knowledge no methods for tiling arrays are described in the literature that can both assess transcript discovery and identify …
Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li
Bayesian Methods For Network-Structured Genomics Data, Stefano Monni, Hongzhe Li
UPenn Biostatistics Working Papers
Graphs and networks are common ways of depicting information. In biology, many different processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This information provides useful supplement to the standard numerical genomic data such as microarray gene expression data. Effectively utilizing such an information can lead to a better identification of biologically relevant genomic features in the context of our prior biological knowledge. In this paper, we present a Bayesian variable selection procedure for network-structured covariates for both Gaussian linear and probit models. The key of our approach is the introduction of a Markov …