Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication Year
- Publication
- Publication Type
Articles 1 - 22 of 22
Full-Text Articles in Physical Sciences and Mathematics
Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang
Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang
Statistical Science Theses and Dissertations
Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.
One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …
Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi
Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi
Honors Theses
All populations display patterns in allele frequencies over time. Some alleles cease to exist, while some grow to become the norm. These frequencies can shift or stay constant based on the conditions the population lives in. If in Hardy-Weinberg equilibrium, the allele frequencies stay constant. Most populations, however, have bias from environmental factors, sexual preferences, other organisms, etc. We propose a stochastic Markov chain model to study allele progression across generations. In such a model, the allele frequencies in the next generation depend only on the frequencies in the current one.
We use this model to track a recessive allele …
A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley
A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley
Theses and Dissertations
According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …
Genome-Wide Systems Genetics Of Alcohol Consumption And Dependence, Kristin Mignogna
Genome-Wide Systems Genetics Of Alcohol Consumption And Dependence, Kristin Mignogna
Theses and Dissertations
Widely effective treatment for alcohol use disorder is not yet available, because the exact biological mechanisms that underlie this disorder are not completely understood. One way to gain a better understanding of these mechanisms is to examine the genetic frameworks that contribute to the risk for developing this disorder. This dissertation examines genetic association data in combination with gene expression networks in the brain to identify functional groups of genes associated with alcohol consumption and dependence.
The first study took advantage of the behavioral complexity of human samples, and experimental capabilities provided by mouse models, by co-analyzing gene expression networks …
Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell
Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell
Theses and Dissertations
The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for …
Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane
Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane
Biostatistics Faculty Publications
INTRODUCTION—Findings for genetic correlates of late-onset Alzheimer's disease (LOAD) in studies that rely solely on clinic visits may differ from those with capacity to follow participants unable to attend clinic visits.
METHODS—We evaluated previously identified LOAD-risk single nucleotide variants in the prospective Adult Changes in Thought study, comparing hazard ratios (HRs) estimated using the full data set of both in-home and clinic visits (n = 1697) to HRs estimated using only data that were obtained from clinic visits (n = 1308). Models were adjusted for age, sex, principal components to account for ancestry, and additional health indicators.
RESULTS …
The Battle Against Malaria: A Teachable Moment, Randy K. Schwartz
The Battle Against Malaria: A Teachable Moment, Randy K. Schwartz
Journal of Humanistic Mathematics
Malaria has been humanity’s worst public health problem throughout recorded history. Mathematical methods are needed to understand which factors are relevant to the disease and to develop counter-measures against it. This article and the accompanying exercises provide examples of those methods for use in lower- or upper-level courses dealing with probability, statistics, or population modeling. These can be used to illustrate such concepts as correlation, causation, conditional probability, and independence. The article explains how the apparent link between sickle cell trait and resistance to malaria was first verified in Uganda using the chi-squared probability distribution. It goes on to explain …
Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore
Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore
Dartmouth Scholarship
Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …
Functional Analysis Of Variance For Association Studies, Olga A. Vsevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu
Functional Analysis Of Variance For Association Studies, Olga A. Vsevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu
Olga A. Vsevolozhskaya
While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. …
Dna Methylation Arrays As Surrogate Measures Of Cell Mixture Distribution, Eugene Houseman, William P. Accomando, Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit
Dna Methylation Arrays As Surrogate Measures Of Cell Mixture Distribution, Eugene Houseman, William P. Accomando, Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit
Dartmouth Scholarship
There has been a long-standing need in biomedical research for a method that quantifies the normally mixed composition of leukocytes beyond what is possible by simple histological or flow cytometric assessments. The latter is restricted by the labile nature of protein epitopes, requirements for cell processing, and timely cell analysis. In a diverse array of diseases and following numerous immune-toxic exposures, leukocyte composition will critically inform the underlying immuno-biology to most chronic medical conditions. Emerging research demonstrates that DNA methylation is responsible for cellular differentiation, and when measured in whole peripheral blood, serves to distinguish cancer cases from controls.
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [van der Laan & Rubin, 2006; van der Laan & Rose, 2011]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation …
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
COBRA Preprint Series
In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Powerful Snp Set Analysis For Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Estimation And Testing For The Effect Of A Genetic Pathway On A Disease Outcome Using Logistic Kernel Machine Regression Via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, Xihong Lin
Estimation And Testing For The Effect Of A Genetic Pathway On A Disease Outcome Using Logistic Kernel Machine Regression Via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Powerful And Flexible Multilocus Association Test For Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, Michael P. Epstein
A Powerful And Flexible Multilocus Association Test For Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, Michael P. Epstein
Harvard University Biostatistics Working Paper Series
No abstract provided.
Assessment Of A Cgh-Based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, Rebecca A. Betensky
Assessment Of A Cgh-Based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, Rebecca A. Betensky
Harvard University Biostatistics Working Paper Series
No abstract provided.
Survival Analysis With Large Dimensional Covariates: An Application In Microarray Studies, David A. Engler, Yi Li
Survival Analysis With Large Dimensional Covariates: An Application In Microarray Studies, David A. Engler, Yi Li
Harvard University Biostatistics Working Paper Series
Use of microarray technology often leads to high-dimensional and low- sample size data settings. Over the past several years, a variety of novel approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptation of the elastic net approach is presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time …
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Harvard University Biostatistics Working Paper Series
No abstract provided.
Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross
Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross
Dartmouth Scholarship
The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.
Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan
Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …
Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie
Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie
Dartmouth Scholarship
The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.
New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski
New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski
COBRA Preprint Series
As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.
Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …