Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Genetics and Genomics

Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski Jul 2009

Trio Logic Regression - Detection Of Snp - Snp Interactions In Case-Parent Trios, Qing Li, Thomas A. Louis, M. Daniele Fallin, Ingo Ruczinski

Johns Hopkins University, Dept. of Biostatistics Working Papers

Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the …


Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis Jan 2009

Associaton Tests That Accommodate Genotyping Errors, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, Thomas A. Louis

Johns Hopkins University, Dept. of Biostatistics Working Papers

High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show …


Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani Feb 2007

Multiple Diseases In Carrier Probability Estimation: Accounting For Surviving All Cancers Other Than Breast And Ovary In Brcapro, Hormuzd A. Katki, Amanda Blackford, Sining Chen, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if …


Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman Dec 2006

Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of …


Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman Mar 2006

Poor Performance Of Bootstrap Confidence Intervals For The Location Of A Quantitative Trait Loucs, Ani Manichaikul, Josee Dupuis, Saunak Sen, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this …


The Role Of An Explicit Causal Framework In Affected Sib Pair Designs With Covariates , Constantine E. Frangakis, Fan Li, Betty Q. Doan Dec 2005

The Role Of An Explicit Causal Framework In Affected Sib Pair Designs With Covariates , Constantine E. Frangakis, Fan Li, Betty Q. Doan

Johns Hopkins University, Dept. of Biostatistics Working Papers

The affected sib/relative pair (ASP/ARP) design is often used with covariates to find genes that can cause a disease in pathways other than through those covariates. However, such "covariates" can themselves have genetic determinants, and the validity of existing methods has so far only been argued under implicit assumptions. We propose an explicit causal formulation of the problem using potential outcomes and principal stratification. The general role of this formulation is to identify and separate the meaning of the different assumptions that can provide valid causal inference in linkage analysis. This separation helps to (a) develop better methods under explicit …


Searching For Differentially Expressed Gene Combinations, Marcel Dettling, Edward Gabrielson, Giovanni Parmigiani Mar 2005

Searching For Differentially Expressed Gene Combinations, Marcel Dettling, Edward Gabrielson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Background: Comparison of mRNA expression levels across biological samples is a widely used approach in genomics. Available data-analytic tools for deriving comprehensive lists of differentially expressed genes rely on data summaries formed using each gene in isolation from others. These approaches ignore biological relationships among genes and may miss important biological insight provided by genomics data.

Methods: We propose a fast, easily interpretable and scalable approach for identifying pairs of genes that are differentially expressed across phenotypes or experimental conditions. These are defined as pairs for which there is detectable phenotype discrimination using the joint distribution, but not from either …


Effect Of Misreported Family History On Mendelian Mutation Prediction Models, Hormuzd A. Katki Sep 2004

Effect Of Misreported Family History On Mendelian Mutation Prediction Models, Hormuzd A. Katki

Johns Hopkins University, Dept. of Biostatistics Working Papers

People with familial history of disease often consult with genetic counselors about their chance of carrying mutations that increase disease risk. To aid them, genetic counselors use Mendelian models that predict whether the person carries deleterious mutations based on their reported family history. Such models rely on accurate reporting of each member's diagnosis and age of diagnosis, but this information may be inaccurate. Commonly encountered errors in family history can significantly distort predictions, and thus can alter the clinical management of people undergoing counseling, screening, or genetic testing. We derive general results about the distortion in the carrier probability estimate …


Accuracy Of Msi Testing In Predicting Germline Mutations Of Msh2 And Mlh1: A Case Study In Bayesian Meta-Analysis Of Diagnostic Tests Without A Gold Standard, Sining Chen, Patrice Watson, Giovanni Parmigiani Jun 2004

Accuracy Of Msi Testing In Predicting Germline Mutations Of Msh2 And Mlh1: A Case Study In Bayesian Meta-Analysis Of Diagnostic Tests Without A Gold Standard, Sining Chen, Patrice Watson, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Microsatellite instability (MSI) testing is a common screening procedure used to identify families that may harbor mutations of a mismatch repair gene and therefore may be at high risk for hereditary colorectal cancer. A reliable estimate of sensitivity and specificity of MSI for detecting germline mutations of mismatch repair genes is critical in genetic counseling and colorectal cancer prevention. Several studies published results of both MSI and mutation analysis on the same subjects. In this article we perform a meta-analysis of these studies and obtain estimates that can be directly used in counseling and screening. In particular we estimate the …


Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau Feb 2004

Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau

Johns Hopkins University, Dept. of Biostatistics Working Papers

We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about differential gene expression. However, the approach is valid in any application that involves multiple comparisons in a large number of hypothesis tests. We discuss two decision problems in the context of this setup: the sample size selection and the decision about the multiple comparisons. We adopt a decision theoretic approach,using loss functions that combine the competing goals of discovering as many ifferentially expressed genes as possible, while keeping …


Unification Of Variance Components And Haseman-Elston Regression For Quantitative Trait Linkage Analysis, Wei-Min Chen, Karl W. Broman, Kung-Yee Liang Oct 2003

Unification Of Variance Components And Haseman-Elston Regression For Quantitative Trait Linkage Analysis, Wei-Min Chen, Karl W. Broman, Kung-Yee Liang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Two of the major approaches for linkage analysis with quantitative traits in humans include variance components and Haseman-Elston regression. Previously, these have been viewed as quite separate methods. We describe a general model, fit by use of generalized estimating equations (GEE), for which the variance components and Haseman-Elston methods (including many of the extensions to the original Haseman-Elston method) are special cases, corresponding to different choices for a working covariance matrix. We also show that the regression-based test of Sham et al.(2002) is equivalent to a robust score statistic derived from our GEE approach. These results have several important implications. …


A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani Oct 2003

A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

In classification problems arising in genomics research it is common to study populations for which a broad class assignment is known (say, normal versus diseased) and one seeks to find undiscovered subclasses within one or both of the known classes. Formally, this problem can be thought of as an unsupervised analysis nested within a supervised one. Here we take the view that the nested unsupervised analysis can successfully utilize information from the entire data set for constructing and/or selecting useful predictors. Specifically, we propose a mixture model approach to the nested unsupervised problem, where the supervised information is used to …