Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Bioinformatics (113)
- Computational Biology (113)
- Physical Sciences and Mathematics (92)
- Statistics and Probability (88)
- Genetics (73)
-
- Statistical Methodology (47)
- Statistical Theory (47)
- Microarrays (46)
- Multivariate Analysis (28)
- Statistical Models (27)
- Biostatistics (23)
- Medicine and Health Sciences (18)
- Survival Analysis (14)
- Applied Mathematics (13)
- Numerical Analysis and Computation (13)
- Categorical Data Analysis (12)
- Public Health (12)
- Epidemiology (10)
- Longitudinal Data Analysis and Time Series (8)
- Diseases (5)
- Genomics (5)
- Laboratory and Basic Science Research (5)
- Clinical Epidemiology (4)
- Design of Experiments and Sample Surveys (4)
- Disease Modeling (4)
- Biochemistry, Biophysics, and Structural Biology (3)
- Biometry (3)
- Clinical Trials (3)
- Keyword
-
- Genetics (35)
- Gene expression (10)
- Bioinformatics (5)
- Microarray (5)
- Model selection (4)
-
- Bootstrap (3)
- Classification (3)
- Cluster analysis (3)
- Comparative genomic hybridization (3)
- Cross-validation (3)
- Density estimation (3)
- Mixture models (3)
- Multiple comparison (3)
- Multiple comparisons (3)
- Prediction (3)
- Survival analysis (3)
- Cancer genomics (2)
- Censored data (2)
- Clustering (2)
- Compendium (2)
- Counting process (2)
- Differential expression (2)
- False discovery rate (2)
- Family-wise error rate control (2)
- High-dimensional data (2)
- High-throughput "omics" (2)
- Hypothesis testing (2)
- Linkage mapping (2)
- Loss function (2)
- Mixture model (2)
- Publication Year
Articles 1 - 30 of 166
Full-Text Articles in Genetics and Genomics
Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan
Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan
COBRA Preprint Series
One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …
Supervised Dimension Reduction For Large-Scale "Omics" Data With Censored Survival Outcomes Under Possible Non-Proportional Hazards, Lauren Spirko-Burns, Karthik Devarajan
Supervised Dimension Reduction For Large-Scale "Omics" Data With Censored Survival Outcomes Under Possible Non-Proportional Hazards, Lauren Spirko-Burns, Karthik Devarajan
COBRA Preprint Series
The past two decades have witnessed significant advances in high-throughput ``omics" technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive ``omics" profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential …
Estimating The Probability Of Clonal Relatedness Of Pairs Of Tumors In Cancer Patients, Audrey Mauguen, Venkatraman E. Seshan, Irina Ostrovnaya, Colin B. Begg
Estimating The Probability Of Clonal Relatedness Of Pairs Of Tumors In Cancer Patients, Audrey Mauguen, Venkatraman E. Seshan, Irina Ostrovnaya, Colin B. Begg
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Next generation sequencing panels are being used increasingly in cancer research to study tumor evolution. A specific statistical challenge is to compare the mutational profiles in different tumors from a patient to determine the strength of evidence that the tumors are clonally related, i.e. derived from a single, founder clonal cell. The presence of identical mutations in each tumor provides evidence of clonal relatedness, although the strength of evidence from a match is related to how commonly the mutation is seen in the tumor type under investigation. This evidence must be weighed against the evidence in favor of independent tumors …
Conditional Screening For Ultra-High Dimensional Covariates With Survival Outcomes, Hyokyoung Grace Hong, Jian Kang, Yi Li
Conditional Screening For Ultra-High Dimensional Covariates With Survival Outcomes, Hyokyoung Grace Hong, Jian Kang, Yi Li
The University of Michigan Department of Biostatistics Working Paper Series
Identifying important biomarkers that are predictive for cancer patients' prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in …
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
COBRA Preprint Series
Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …
Meta-Analysis Of Genome-Wide Association Studies With Correlated Individuals: Application To The Hispanic Community Health Study/Study Of Latinos (Hchs/Sol), Tamar Sofer, John R. Shaffer, Misa Graff, Qibin Qi, Adrienne M. Stilp, Stephanie M. Gogarten, Kari E. North, Carmen R. Isasi, Cathy C. Laurie, Adam A. Szpiro
Meta-Analysis Of Genome-Wide Association Studies With Correlated Individuals: Application To The Hispanic Community Health Study/Study Of Latinos (Hchs/Sol), Tamar Sofer, John R. Shaffer, Misa Graff, Qibin Qi, Adrienne M. Stilp, Stephanie M. Gogarten, Kari E. North, Carmen R. Isasi, Cathy C. Laurie, Adam A. Szpiro
UW Biostatistics Working Paper Series
Investigators often meta-analyze multiple genome-wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta-analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta-analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed-effect model for meta-analysis, “MetaCor", which accounts for correlation between stratum-specific effect estimates. Simulations show that MetaCor controls …
Testing Gene-Environment Interactions In The Presence Of Measurement Error, Chongzhi Di, Li Hsu, Charles Kooperberg, Alex Reiner, Ross Prentice
Testing Gene-Environment Interactions In The Presence Of Measurement Error, Chongzhi Di, Li Hsu, Charles Kooperberg, Alex Reiner, Ross Prentice
UW Biostatistics Working Paper Series
Complex diseases result from an interplay between genetic and environmental risk factors, and it is of great interest to study the gene-environment interaction (GxE) to understand the etiology of complex diseases. Recent developments in genetics field allows one to study GxE systematically. However, one difficulty with GxE arises from the fact that environmental exposures are often measured with error. In this paper, we focus on testing GxE when the environmental exposure E is subject to measurement error. Surprisingly, contrast to the well-established results that the naive test ignoring measurement error is valid in testing the main effects, we find that …
Computational Model For Survey And Trend Analysis Of Patients With Endometriosis : A Decision Aid Tool For Ebm, Salvo Reina, Vito Reina, Franco Ameglio, Mauro Costa, Alessandro Fasciani
Computational Model For Survey And Trend Analysis Of Patients With Endometriosis : A Decision Aid Tool For Ebm, Salvo Reina, Vito Reina, Franco Ameglio, Mauro Costa, Alessandro Fasciani
COBRA Preprint Series
Endometriosis is increasingly collecting worldwide attention due to its medical complexity and social impact. The European community has identified this as a “social disease”. A large amount of information comes from scientists, yet several aspects of this pathology and staging criteria need to be clearly defined on a suitable number of individuals. In fact, available studies on endometriosis are not easily comparable due to a lack of standardized criteria to collect patients’ informations and scarce definitions of symptoms. Currently, only retrospective surgical stadiation is used to measure pathology intensity, while the Evidence Based Medicine (EBM) requires shareable methods and correct …
Set-Based Tests For Genetic Association In Longitudinal Studies, Zihuai He, Min Zhang, Seunggeun Lee, Jennifer A. Smith, Xiuqing Guo, Walter Palmas, Sharon L.R. Kardia, Ana V. Diez Roux, Bhramar Mukherjee
Set-Based Tests For Genetic Association In Longitudinal Studies, Zihuai He, Min Zhang, Seunggeun Lee, Jennifer A. Smith, Xiuqing Guo, Walter Palmas, Sharon L.R. Kardia, Ana V. Diez Roux, Bhramar Mukherjee
The University of Michigan Department of Biostatistics Working Paper Series
Genetic association studies with longitudinal markers of chronic diseases (e.g., blood pressure, body mass index) provide a valuable opportunity to explore how genetic variants affect traits over time by utilizing the full trajectory of longitudinal outcomes. Since these traits are likely influenced by the joint effect of multiple variants in a gene, a joint analysis of these variants considering linkage disequilibrium (LD) may help to explain additional phenotypic variation. In this article, we propose a longitudinal genetic random field model (LGRF), to test the association between a phenotype measured repeatedly during the course of an observational study and a set …
Why Odds Ratio Estimates Of Gwas Are Almost Always Close To 1.0, Yutaka Yasui
Why Odds Ratio Estimates Of Gwas Are Almost Always Close To 1.0, Yutaka Yasui
COBRA Preprint Series
“Missing heritability” in genome-wide association studies (GWAS) refers to the seeming inability for GWAS data to capture the great majority of genetic causes of a disease in comparison to the known degree of heritability for the disease, in spite of GWAS’ genome-wide measures of genetic variations. This paper presents a simple mathematical explanation for this phenomenon, assuming that the heritability information exists in GWAS data. Specifically, it focuses on the fact that the great majority of association measures (in the form of odds ratios) from GWAS are consistently close to the value that indicates no association, explains why this occurs, …
Sparse Integrative Clustering Of Multiple Omics Data Sets, Ronglai Shen, Sijian Wang, Qianxing Mo
Sparse Integrative Clustering Of Multiple Omics Data Sets, Ronglai Shen, Sijian Wang, Qianxing Mo
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation, and gene expression associated with a disease. An integrated genomic profiling approach measuring multiple omics data types simultaneously in the same set of biological samples would render an integrated data resolution that would not be available with any single data type. In a previous publication (Shen et al., 2009), we proposed a latent variable regression with a lasso constraint (Tibshirani, 1996) for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient …
Modeling Protein Expression And Protein Signaling Pathways, Donatello Telesca, Peter Muller, Steven Kornblau, Marc Suchard, Yuan Ji
Modeling Protein Expression And Protein Signaling Pathways, Donatello Telesca, Peter Muller, Steven Kornblau, Marc Suchard, Yuan Ji
COBRA Preprint Series
High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The …
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [van der Laan & Rubin, 2006; van der Laan & Rose, 2011]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation …
Gc-Content Normalization For Rna-Seq Data, Davide Risso, Katja Schwartz, Gavin Sherlock, Sandrine Dudoit
Gc-Content Normalization For Rna-Seq Data, Davide Risso, Katja Schwartz, Gavin Sherlock, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
Background: Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.
Results: We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. …
Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer
Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
COBRA Preprint Series
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …
A Bayesian Model Averaging Approach For Observational Gene Expression Studies, Xi Kathy Zhou, Fei Liu, Andrew J. Dannenberg
A Bayesian Model Averaging Approach For Observational Gene Expression Studies, Xi Kathy Zhou, Fei Liu, Andrew J. Dannenberg
COBRA Preprint Series
Identifying differentially expressed (DE) genes associated with a sample characteristic is the primary objective of many microarray studies. As more and more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene finding. Typical methods for identifying DE genes require ranking all the genes according to a pre-selected statistic based on a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches unavoidably result in model misspecification, which can lead to increased error …
Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale
Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale
Johns Hopkins University, Dept. of Biostatistics Working Papers
Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …
Removing Technical Variability In Rna-Seq Data Using Conditional Quantile Normalization, Kasper D. Hansen, Rafael A. Irizarry, Zhijin Wu
Removing Technical Variability In Rna-Seq Data Using Conditional Quantile Normalization, Kasper D. Hansen, Rafael A. Irizarry, Zhijin Wu
Johns Hopkins University, Dept. of Biostatistics Working Papers
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a …
Statistical Properties Of The Integrative Correlation Coefficient: A Measure Of Cross-Study Gene Reproducibility, Leslie Cope, Giovanni Parmigiani
Statistical Properties Of The Integrative Correlation Coefficient: A Measure Of Cross-Study Gene Reproducibility, Leslie Cope, Giovanni Parmigiani
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos
A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos
U.C. Berkeley Division of Biostatistics Working Paper Series
In many analyses, one has data on one level but desires to draw inference on another level. For example, in genetic association studies, one observes units of DNA referred to as SNPs, but wants to determine whether genes that are comprised of SNPs are associated with disease. While there are some available approaches for addressing this issue, they usually involve making parametric assumptions and are not easily generalizable. A statistical test is proposed for testing the association of a set of variables with an outcome of interest. No assumptions are made about the functional form relating the variables to the …
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
COBRA Preprint Series
In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …
A Bayesian Shared Component Model For Genetic Association Studies, Juan J. Abellan, Carlos Abellan, Juan R. Gonzalez
A Bayesian Shared Component Model For Genetic Association Studies, Juan J. Abellan, Carlos Abellan, Juan R. Gonzalez
COBRA Preprint Series
We present a novel approach to address genome association studies between single nucleotide polymorphisms (SNPs) and disease. We propose a Bayesian shared component model to tease out the genotype information that is common to cases and controls from the one that is specific to cases only. This allows to detect the SNPs that show the strongest association with the disease. The model can be applied to case-control studies with more than one disease. In fact, we illustrate the use of this model with a dataset of 23,418 SNPs from a case-control study by The Welcome Trust Case Control Consortium (2007) …
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
COBRA Preprint Series
The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.
We …
Using The R Package Crlmm For Genotyping And Copy Number Estimation, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, Ingo Ruczinski
Using The R Package Crlmm For Genotyping And Copy Number Estimation, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, Ingo Ruczinski
Johns Hopkins University, Dept. of Biostatistics Working Papers
Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for …
A Perturbation Method For Inference On Regularized Regression Estimates, Jessica Minnier, Lu Tian, Tianxi Cai
A Perturbation Method For Inference On Regularized Regression Estimates, Jessica Minnier, Lu Tian, Tianxi Cai
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Decision-Theory Approach To Interpretable Set Analysis For High-Dimensional Data, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani
A Decision-Theory Approach To Interpretable Set Analysis For High-Dimensional Data, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, Giovanni Parmigiani
Johns Hopkins University, Dept. of Biostatistics Working Papers
A ubiquitous problem in igh-dimensional analysis is the identification of pre-defined sets that are enriched for features showing an association of interest. In this situation, inference is performed on sets, not individual features. We propose an approach which focuses on estimating the fraction of non-null features in a set. We search for unions of disjoint sets (atoms), using as the loss function a weighted average of the number of false and missed discoveries. We prove that the solution is equivalent to thresholding the atomic false discovery rate and that our approach results in a more interpretable set analysis.
The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel
The Strength Of Statistical Evidence For Composite Hypotheses: Inference To The Best Explanation, David R. Bickel
COBRA Preprint Series
A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with …
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.