Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Physical Sciences and Mathematics (21)
- Statistics and Probability (21)
- Bioinformatics (15)
- Computational Biology (15)
- Statistical Methodology (11)
-
- Statistical Theory (11)
- Biostatistics (10)
- Microarrays (10)
- Epidemiology (7)
- Medicine and Health Sciences (7)
- Public Health (7)
- Statistical Models (6)
- Survival Analysis (6)
- Categorical Data Analysis (5)
- Multivariate Analysis (5)
- Longitudinal Data Analysis and Time Series (4)
- Laboratory and Basic Science Research (2)
- Applied Mathematics (1)
- Clinical Trials (1)
- Design of Experiments and Sample Surveys (1)
- Numerical Analysis and Computation (1)
- Publication
-
- Harvard University Biostatistics Working Paper Series (12)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (7)
- COBRA Preprint Series (6)
- Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series (3)
- U.C. Berkeley Division of Biostatistics Working Paper Series (3)
Articles 1 - 30 of 35
Full-Text Articles in Genetics and Genomics
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [van der Laan & Rubin, 2006; van der Laan & Rose, 2011]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation …
A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos
A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos
U.C. Berkeley Division of Biostatistics Working Paper Series
In many analyses, one has data on one level but desires to draw inference on another level. For example, in genetic association studies, one observes units of DNA referred to as SNPs, but wants to determine whether genes that are comprised of SNPs are associated with disease. While there are some available approaches for addressing this issue, they usually involve making parametric assumptions and are not easily generalizable. A statistical test is proposed for testing the association of a set of variables with an outcome of interest. No assumptions are made about the functional form relating the variables to the …
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
COBRA Preprint Series
In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …
A Bayesian Shared Component Model For Genetic Association Studies, Juan J. Abellan, Carlos Abellan, Juan R. Gonzalez
A Bayesian Shared Component Model For Genetic Association Studies, Juan J. Abellan, Carlos Abellan, Juan R. Gonzalez
COBRA Preprint Series
We present a novel approach to address genome association studies between single nucleotide polymorphisms (SNPs) and disease. We propose a Bayesian shared component model to tease out the genotype information that is common to cases and controls from the one that is specific to cases only. This allows to detect the SNPs that show the strongest association with the disease. The model can be applied to case-control studies with more than one disease. In fact, we illustrate the use of this model with a dataset of 23,418 SNPs from a case-control study by The Welcome Trust Case Control Consortium (2007) …
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
COBRA Preprint Series
The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.
We …
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski
Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski
Johns Hopkins University, Dept. of Biostatistics Working Papers
Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the …
Fitting Ace Structural Equation Models To Case-Control Family Data, Kristin N. Javaras, James I. Hudson, Nan M. Laird
Fitting Ace Structural Equation Models To Case-Control Family Data, Kristin N. Javaras, James I. Hudson, Nan M. Laird
COBRA Preprint Series
Investigators interested in whether a disease aggregates in families often collect case-control family data, which consist of disease status and covariate information for families selected via case or control probands. Here, we focus on the use of case-control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). To this end, we describe a ACE model for binary family data and then introduce an approach to fitting the model to case-control family data. The structural equation model, which has been described previously, combines a general-family extension of …
Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis
Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis
Johns Hopkins University, Dept. of Biostatistics Working Papers
High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show …
Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin
Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Hidden Markov Random Field Model For Genome-Wide Association Studies, Hongzhe Li, Zhi Wei, J M. Maris
A Hidden Markov Random Field Model For Genome-Wide Association Studies, Hongzhe Li, Zhi Wei, J M. Maris
UPenn Biostatistics Working Papers
Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior …
Estimation And Testing For The Effect Of A Genetic Pathway On A Disease Outcome Using Logistic Kernel Machine Regression Via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, Xihong Lin
Estimation And Testing For The Effect Of A Genetic Pathway On A Disease Outcome Using Logistic Kernel Machine Regression Via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Powerful And Flexible Multilocus Association Test For Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, Michael P. Epstein
A Powerful And Flexible Multilocus Association Test For Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, Michael P. Epstein
Harvard University Biostatistics Working Paper Series
No abstract provided.
U-Statistics-Based Tests For Multiple Genes In Genetic Association Studies, Zhi Wei, Mingyao Li Phd, Timothy Rebbeck, Hongzhe Li
U-Statistics-Based Tests For Multiple Genes In Genetic Association Studies, Zhi Wei, Mingyao Li Phd, Timothy Rebbeck, Hongzhe Li
UPenn Biostatistics Working Papers
Abstract: As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive …
Assessing Population Level Genetic Instability Via Moving Average, Samuel Mcdaniel, Rebecca Betensky, Tianxi Cai
Assessing Population Level Genetic Instability Via Moving Average, Samuel Mcdaniel, Rebecca Betensky, Tianxi Cai
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Hidden Spatial-Temporal Markov Random Field Model For Network-Based Analysis Of Time Course Gene Expression Data, Zhi Wei, Hongzhe Li
A Hidden Spatial-Temporal Markov Random Field Model For Network-Based Analysis Of Time Course Gene Expression Data, Zhi Wei, Hongzhe Li
UPenn Biostatistics Working Papers
Microarray time course (MTC) gene expression data are commonly collected to study the dynamic nature of biological processes. One important problem is to identify genes that show different expression profiles over time and pathways that are perturbed during a given biological process. While methods are available to identify the genes with differential expression levels over time, there is a lack of methods that can incorporate the pathway information in identifying the pathways being modified/activated during a biological process. In this paper, we develop a hidden spatial-temporal Markov random field (hstMRF)-based method for identifying genes and subnetworks that are related to …
Assessment Of A Cgh-Based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, Rebecca A. Betensky
Assessment Of A Cgh-Based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, Rebecca A. Betensky
Harvard University Biostatistics Working Paper Series
No abstract provided.
Survival Analysis With Large Dimensional Covariates: An Application In Microarray Studies, David A. Engler, Yi Li
Survival Analysis With Large Dimensional Covariates: An Application In Microarray Studies, David A. Engler, Yi Li
Harvard University Biostatistics Working Paper Series
Use of microarray technology often leads to high-dimensional and low- sample size data settings. Over the past several years, a variety of novel approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptation of the elastic net approach is presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time …
Statistical Evaluation Of Evidence For Clonal Allelic Alterations In Array-Cgh Experiments, Colin B. Begg, Kevin Eng, Adam Olshen, E S. Venkatraman
Statistical Evaluation Of Evidence For Clonal Allelic Alterations In Array-Cgh Experiments, Colin B. Begg, Kevin Eng, Adam Olshen, E S. Venkatraman
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
In recent years numerous investigators have conducted genetic studies of pairs of tumor specimens from the same patient to determine whether the tumors share a clonal origin. These studies have the potential to be of considerable clinical significance, especially in clinical settings where the distinction of a new primary cancer and metastatic spread of a previous cancer would lead to radically different indications for treatment. Studies of clonality have typically involved comparison of the patterns of somatic mutations in the tumors at candidate genetic loci to see if the patterns are sufficiently similar to indicate a clonal origin. More recently, …
Sequential Quantitative Trait Locus Mapping In Experimental Crosses, Jaya M. Satagopan, Saunak Sen, Gary A. Churchill
Sequential Quantitative Trait Locus Mapping In Experimental Crosses, Jaya M. Satagopan, Saunak Sen, Gary A. Churchill
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
The etiology of complex diseases is heterogeneous. The presence of risk alleles in one or more genetic loci affects the function of a variety of intermediate biological pathways, resulting in the overt expression of disease. Hence, there is an increasing focus on identifying the genetic basis of disease by sytematically studying phenotypic traits pertaining to the underlying biological functions. In this paper we focus on identifying genetic loci linked to quantitative phenotypic traits in experimental crosses. Such genetic mapping methods often use a one stage design by genotyping all the markers of interest on the available subjects. A genome scan …
Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani
Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani
Johns Hopkins University, Dept. of Biostatistics Working Papers
Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if …
Power Boosting In Genome-Wide Studies Via Methods For Multivariate Outcomes, Mary J. Emond
Power Boosting In Genome-Wide Studies Via Methods For Multivariate Outcomes, Mary J. Emond
UW Biostatistics Working Paper Series
Whole-genome studies are becoming a mainstay of biomedical research. Examples include expression array experiments, comparative genomic hybridization analyses and large case-control studies for detecting polymorphism/disease associations. The tactic of applying a regression model to every locus to obtain test statistics is useful in such studies. However, this approach ignores potential correlation structure in the data that could be used to gain power, particularly when a Bonferroni correction is applied to adjust for multiple testing. In this article, we propose using regression techniques for misspecified multivariate outcomes to increase statistical power over independence-based modeling at each locus. Even when the outcome …
Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman
Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman
Johns Hopkins University, Dept. of Biostatistics Working Papers
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of …
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Unifying Approach For Haplotype Analysis Of Quantitative Traits In Family-Based Association Studies: Testing And Estimating Gene-Environment Interactions With Complex Exposure Variables, Stijn Vansteelandt, Christoph Lange
A Unifying Approach For Haplotype Analysis Of Quantitative Traits In Family-Based Association Studies: Testing And Estimating Gene-Environment Interactions With Complex Exposure Variables, Stijn Vansteelandt, Christoph Lange
COBRA Preprint Series
We propose robust and e±cient tests and estimators for gene-environment/gene-drug interactions in family-based association studies. The methodology is designed for studies in which haplotypes, quantitative pheno- types and complex exposure/treatment variables are analyzed. Using causal inference methodology, we derive family-based association tests and estimators for the genetic main effects and the interactions. The tests and estimators are robust against population admixture and strati¯cation without requiring adjustment for confounding variables. We illustrate the practical relevance of our approach by an application to a COPD study. The data analysis suggests a gene-environment interaction between a SNP in the Serpine gene and smok- …
Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng
Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng
Harvard University Biostatistics Working Paper Series
No abstract provided.
Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin
Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin
Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin
A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Faster Circular Binary Segmentation Algorithm For The Analysis Of Array Cgh Data, E S. Venkatraman, Adam Olshen
A Faster Circular Binary Segmentation Algorithm For The Analysis Of Array Cgh Data, E S. Venkatraman, Adam Olshen
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the …