Genetics and Genomics | Open Access Articles

Bayesian Prediction Intervals For Assessing P-Value Variability In Prospective Replication Studies, Olga A. Vsevolozhskaya, Gabriel Ruiz, Dmitri Zaykin

Biostatistics Faculty Publications

Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. Hypothesis and significance testing, and the accompanying P-values are being scrutinized as representing the most widely applied and abused practices. One line of critique is that P-values are inherently unfit to fulfill their ostensible role as measures of credibility for scientific hypotheses. It has also been suggested that while P-values …

Go to article

Enrichment Of Putatively Damaging Rare Variants In The Dyx2 Locus And The Reading-Related Genes Ccdc136 And Flnc, Andrew K. Adams, Shelley D. Smith, Dongnhu T. Truong, Erik G. Willcutt, Richard K. Olson, John C. Defries, Bruce F. Pennington, Jeffrey R. Gruen

Psychology: Faculty Scholarship

Eleven loci with prior evidence for association with reading and language phenotypes were sequenced in 96 unrelated subjects with significant impairment in reading performance drawn from the Colorado Learning Disability Research Center collection. Out of 148 total individual missense variants identified, the chromosome 7 genes CCDC136 and FLNC contained 19. In addition, a region corresponding to the well-known DYX2 locus for RD contained 74 missense variants. Both allele sets were filtered for a minor allele frequency ≤0.01 and high Polyphen-2 scores. To determine if observations of these alleles are occurring more frequently in our cases than expected by chance in …

Go to article

Multiple Testing Correction With Repeated Correlated Outcomes: Applications To Epigenetics, Katie Leap

Masters Theses

Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time.

We found that testing each site with a linear mixed effects model and then controlling the false discovery rate …

Go to article

Systems Biology Approach To Late-Onset Alzheimer's Disease Genome-Wide Association Study Identifies Novel Candidate Genes Validated Using Brain Expression Data And Caenorhabditis Elegans Experiments, Shubhabrata Mukherjee, Joshua C. Russell, Daniel T. Carr, Jeremy D. Burgess, Mariet Allen, Daniel J. Serie, Kevin L. Boehme, John S. K. Kauwe, Adam C. Naj, David W. Fardo, Dennis W. Dickson, Thomas J. Montine, Nilufer Ertekin-Taner, Matt R. Kaeberlein, Paul K. Crane

Biostatistics Faculty Publications

Introduction—We sought to determine whether a systems biology approach may identify novel late-onset Alzheimer's disease (LOAD) loci.

Methods—We performed gene-wide association analyses and integrated results with human protein-protein interaction data using network analyses. We performed functional validation on novel genes using a transgenic Caenorhabditis elegans Aβ proteotoxicity model and evaluated novel genes using brain expression data from people with LOAD and other neurodegenerative conditions.

Results—We identified 13 novel candidate LOAD genes outside chromosome 19. Of those, RNA interference knockdowns of the C. elegans orthologs of UBC, NDUFS3, EGR1, and ATP5H were associated with Aβ …

Go to article

Increased Birth Weight Is Associated With Altered Gene Expression In Neonatal Foreskin, Leryn J. Reynolds, Rebecca I. Pollack, Richard J. Charnigo, Cetewayo S. Rashid, Arnold J. Stromberg, Shu Shen, John O'Brien, Kevin J. Pearson

Pharmacology and Nutritional Sciences Faculty Publications

Elevated birth weight is linked to glucose intolerance and obesity health-related complications later in life. No studies have examined if infant birth weight is associated with gene expression markers of obesity and inflammation in a tissue that comes directly from the infant following birth. We evaluated the association between birth weight and gene expression on fetal programming of obesity. Foreskin samples were collected following circumcision, and gene expression analyzed comparing the 15% greatest birth weight infants (n = 7) v. the remainder of the cohort (n = 40). Multivariate linear regression models were fit to relate expression levels on differentially …

Go to article

Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane

Biostatistics Faculty Publications

INTRODUCTION—Findings for genetic correlates of late-onset Alzheimer's disease (LOAD) in studies that rely solely on clinic visits may differ from those with capacity to follow participants unable to attend clinic visits.

METHODS—We evaluated previously identified LOAD-risk single nucleotide variants in the prospective Adult Changes in Thought study, comparing hazard ratios (HRs) estimated using the full data set of both in-home and clinic visits (n = 1697) to HRs estimated using only data that were obtained from clinic visits (n = 1308). Models were adjusted for age, sex, principal components to account for ancestry, and additional health indicators.

RESULTS …

Go to article

Testing The Independence Hypothesis Of Accepted Mutations For Pairs Of Adjacent Amino Acids In Protein Sequences, Jyotsna Ramanan, Peter Revesz

School of Computing: Faculty Publications

Evolutionary studies usually assume that the genetic mutations are independent of each other. However, that does not imply that the observed mutations are independent of each other because it is possible that when a nucleotide is mutated, then it may be biologically beneficial if an adjacent nucleotide mutates too. With a number of decoded genes currently available in various genome libraries and online databases, it is now possible to have a large-scale computer-based study to test whether the independence assumption holds for pairs of adjacent amino acids. Hence the independence question also arises for pairs of adjacent amino acids within …

Go to article

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

In this thesis, we propose statistical models for addressing commonly encountered data types and study designs in large epidemiologic investigations aimed at understanding the molecular basis of complex disorders. The motivating applications come from diverse disease areas in Women's Health, including the study of type II diabetes in the Women's Health Initiative (WHI), invasive breast cancer in the Nurses' Health Study and the study of the metabolomic underpinnings of cardiovascular disease in the WHI. We have also put significant effort into making the implementation of the proposed methods accessible through freely available, user-friendly software packages in R. The first chapter …

Go to article

Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li

Dissertations & Theses (Open Access)

My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research.

The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity …

Go to article

Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun

Biostatistics Faculty Publications

In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer …

Go to article

Estimating The Probability Of Clonal Relatedness Of Pairs Of Tumors In Cancer Patients, Audrey Mauguen, Venkatraman E. Seshan, Irina Ostrovnaya, Colin B. Begg

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Next generation sequencing panels are being used increasingly in cancer research to study tumor evolution. A specific statistical challenge is to compare the mutational profiles in different tumors from a patient to determine the strength of evidence that the tumors are clonally related, i.e. derived from a single, founder clonal cell. The presence of identical mutations in each tumor provides evidence of clonal relatedness, although the strength of evidence from a match is related to how commonly the mutation is seen in the tumor type under investigation. This evidence must be weighed against the evidence in favor of independent tumors …

Go to article

Detecting Discordance Enrichment Among A Series Of Two-Sample Genome-Wide Expression Data Sets, Yinglei Lai, Fanni Zhang, Tapan Nayak, Reza Modarres, Norman H. Lee, Timothy A. Mccaffrey

Epidemiology Faculty Publications

Background

With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest.

Methods

In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when …

Go to article

Statistical Analyses To Detect And Refine Genetic Associations With Neurodegenerative Diseases, Yuriko Katsumata

Theses and Dissertations--Epidemiology and Biostatistics

Dementia is a clinical state caused by neurodegeneration and characterized by a loss of function in cognitive domains and behavior. Alzheimer’s disease (AD) is the most common form of dementia. Although the amyloid β (Aβ) protein and hyperphosphorylated tau aggregates in the brain are considered to be the key pathological hallmarks of AD, the exact cause of AD is yet to be identified. In addition, clinical diagnoses of AD can be error prone. Many previous studies have compared the clinical diagnosis of AD against the gold standard of autopsy confirmation and shown substantial AD misdiagnosis Hippocampal sclerosis of aging (HS-Aging) …

Go to article

Family-Based Association Studies Of Autism In Boys Via Facial-Feature Clusters, Luke Andrew Settles

Masters Theses

"Autism spectrum disorder (ASD) refers to a set of developmental disorders with varied attributes. Due to its substantial heterogeneity in terms of behavioral and clinical phenotypes, it is challenging to discern the genetic biomarkers behind ASD, even though the disease is known to be genetic in nature. This serves as a motivation to detect relationships between single nucleotide polymorphisms (SNPs) and a causal autism disease susceptibility locus (DSL) within more homogeneous subgroups. Recently, clinically meaningful subclassifications of ASD have been discovered utilizing facial features of prepubescent boys. Therefore, through the employment of data from 44 prepubertal Caucasian boys with ASD …

Go to article

Genetics and Genomics Commons^™

Full-Text Articles in Genetics and Genomics

Bayesian Prediction Intervals For Assessing P-Value Variability In Prospective Replication Studies, Olga A. Vsevolozhskaya, Gabriel Ruiz, Dmitri Zaykin

Biostatistics Faculty Publications

Enrichment Of Putatively Damaging Rare Variants In The Dyx2 Locus And The Reading-Related Genes Ccdc136 And Flnc, Andrew K. Adams, Shelley D. Smith, Dongnhu T. Truong, Erik G. Willcutt, Richard K. Olson, John C. Defries, Bruce F. Pennington, Jeffrey R. Gruen

Psychology: Faculty Scholarship

Multiple Testing Correction With Repeated Correlated Outcomes: Applications To Epigenetics, Katie Leap

Masters Theses

Biostatistics Faculty Publications

Increased Birth Weight Is Associated With Altered Gene Expression In Neonatal Foreskin, Leryn J. Reynolds, Rebecca I. Pollack, Richard J. Charnigo, Cetewayo S. Rashid, Arnold J. Stromberg, Shu Shen, John O'Brien, Kevin J. Pearson

Pharmacology and Nutritional Sciences Faculty Publications

Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane

Biostatistics Faculty Publications

Testing The Independence Hypothesis Of Accepted Mutations For Pairs Of Adjacent Amino Acids In Protein Sequences, Jyotsna Ramanan, Peter Revesz

School of Computing: Faculty Publications

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li

Dissertations & Theses (Open Access)

Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun

Biostatistics Faculty Publications

Estimating The Probability Of Clonal Relatedness Of Pairs Of Tumors In Cancer Patients, Audrey Mauguen, Venkatraman E. Seshan, Irina Ostrovnaya, Colin B. Begg

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Detecting Discordance Enrichment Among A Series Of Two-Sample Genome-Wide Expression Data Sets, Yinglei Lai, Fanni Zhang, Tapan Nayak, Reza Modarres, Norman H. Lee, Timothy A. Mccaffrey

Epidemiology Faculty Publications

Background

Methods

Statistical Analyses To Detect And Refine Genetic Associations With Neurodegenerative Diseases, Yuriko Katsumata

Theses and Dissertations--Epidemiology and Biostatistics

Family-Based Association Studies Of Autism In Boys Via Facial-Feature Clusters, Luke Andrew Settles

Masters Theses