Open Access. Powered by Scholars. Published by Universities.®

Biostatistics Commons

Open Access. Powered by Scholars. Published by Universities.®

2,082 Full-Text Articles 4,171 Authors 452,814 Downloads 77 Institutions

All Articles in Biostatistics

Faceted Search

2,082 full-text articles. Page 1 of 74.

Gene Co-Expression Networks Analysis Reveal Novel Molecular Endotypes In Alpha-1 Antitrypsin Deficiency, Jen-hwa Chu, Wenlan Zang 2019 Yale University

Gene Co-Expression Networks Analysis Reveal Novel Molecular Endotypes In Alpha-1 Antitrypsin Deficiency, Jen-Hwa Chu, Wenlan Zang

Yale Day of Data

Rationale:Alpha-1 antitrypsin deficiency (AATD) is a genetic condition that predisposes to early onset pulmonary emphysema and airways obstruction. The exact mechanism through which AATD leads to lung disease is incompletely understood.

Objectives: To investigate the effect of AAT genotype and augmentation therapy on bronchoalveolar lavage (BAL) and peripheral blood mononuclear cells (PBMC) transcriptome, while examining the link between gene expression profiles, and clinical features of AATD.

Methods: We performed RNA-Seq on RNA extracted from BAL and PBMC on samples obtained from 89 AATD patients enrolled in the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study. Differential gene ...


A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan 2019 Yale University School of Public Health

A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan

Yale Day of Data

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both ...


Instances Of Influenza In The United States Visualized, Parth Patel 2018 CUNY New York City College of Technology

Instances Of Influenza In The United States Visualized, Parth Patel

Publications and Research

The Tycho Project collects large data sets related to healthcare and in particular, instances and geographical information of diseases. We look at the instance counts and locations of Influenza from 1919-1951 across the United States. We hope to find seasonal and geographical insight to the spread of the disease.


A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano 2018 University of Arkansas, Fayetteville

A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano

Theses and Dissertations

The objective of the research this thesis describes is to find a way to classify text-based descriptions of biological adaption to support Biologically Inspired design. Biologically inspired design is a fairly new field with ongoing research. There are different tools to assist designers and biologists in bio-inspired design. Some of the most common are BioTRIZ and AskNature. In recent years, more tools have been proposed to aid and make research in the field easier, for example, the Biologically Inspired Adaptive System Design (BIASD) tool. This tool was designed with the goal of helping designers in early design stages generate more ...


Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan 2018 University of Arkansas, Fayetteville

Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan

Theses and Dissertations

The USDA Forest Service aims to use satellite imagery for monitoring and predicting changes in forest conditions over time within the country. We specifically focus on a 230, 400 hectares region in north-central Wisconsin between 2003 - 2012. The auxiliary data collected from the satellite imagery of this region are relatively dense in space and time and can be used to efficiently predict how the forest condition changed over that decade. However, these records have a significant proportion of missing values due to weather conditions and system failures. To fill in these missing values, we build spaciotemporal models based on fixed ...


Concentrations Of Criteria Pollutants In The Contiguous U.S., 1979 – 2015: Role Of Model Parsimony In Integrated Empirical Geographic Regression, Sun-Young Kim, Matthew Bechle, Steve Hankey, Elizabeth (Lianne) A. Sheppard, Adam A. Szpiro, Julian D. Marshall 2018 University of Washington - Seattle Campus

Concentrations Of Criteria Pollutants In The Contiguous U.S., 1979 – 2015: Role Of Model Parsimony In Integrated Empirical Geographic Regression, Sun-Young Kim, Matthew Bechle, Steve Hankey, Elizabeth (Lianne) A. Sheppard, Adam A. Szpiro, Julian D. Marshall

UW Biostatistics Working Paper Series

BACKGROUND: National- or regional-scale prediction models that estimate individual-level air pollution concentrations commonly include hundreds of geographic variables. However, these many variables may not be necessary and parsimonious approach including small numbers of variables may achieve sufficient prediction ability. This parsimonious approach can also be applied to most criteria pollutants. This approach will be powerful when generating publicly available datasets of model predictions that support research in environmental health and other fields. OBJECTIVES: We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants, for all years with regulatory monitoring ...


Stochastic Lanczos Likelihood Estimation Of Genomic Variance Components, Richard Border 2018 University of Colorado, Boulder

Stochastic Lanczos Likelihood Estimation Of Genomic Variance Components, Richard Border

Applied Mathematics Graduate Theses & Dissertations

Genomic variance components analysis seeks to estimate the extent to which interindividual variation in a given trait can be attributed to genetic similarity. Likelihood estimation of such models involves computationally expensive operations on large, dense, and unstructured matrices of high rank. As a result, standard estimation procedures relying on direct matrix methods become prohibitively expensive as sample sizes increase. We propose a novel estimation procedure that uses the Lanczos process and stochastic Lanczos quadrature to approximate the likelihood for an initial choice of parameter values. Then, by identifying the variance components parameter space with a family of shifted linear systems ...


Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma 2018 The University of Western Ontario

Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma

Electronic Thesis and Dissertation Repository

When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a ...


Analysis Of Ranked Gene Tree Probability Distributions Under The Coalescent Process For Detecting Anomaly Zones, Anastasiia Kim 2018 University of New Mexico

Analysis Of Ranked Gene Tree Probability Distributions Under The Coalescent Process For Detecting Anomaly Zones, Anastasiia Kim

Shared Knowledge Conference

In phylogenetic studies, gene trees are used to reconstruct species tree. Under the multispecies coalescent model, gene trees topologies may differ from that of species trees. The incorrect gene tree topology (one that does not match the species tree) that is more probable than the correct one is termed anomalous gene tree (AGT). Species trees that can generate such AGTs are said to be in the anomaly zone (AZ). In this region, the method of choosing the most common gene tree as the estimate of the species tree will be inconsistent and will converge to an incorrect species tree when ...


Genome-Wide Analysis Of Alternative Rna Splicing In Children With Acute Myeloid Leukemia (Aml), Xichen Li 2018 University of New Mexico - Main Campus

Genome-Wide Analysis Of Alternative Rna Splicing In Children With Acute Myeloid Leukemia (Aml), Xichen Li

Shared Knowledge Conference

The pediatric Acute Myeloid Leukemia (AML) is a high-risk and hard-to-treat childhood cancer that originates in the bone marrow from immature white blood cells. Recently, more and more evidence indicates that aberrant splicing of genes is a common characteristic for AML. Gene expression profiles have proved extremely useful for identifying genes that are associated with clinical characteristics and survival outcome of cancer patients. However, conventional gene expression profiles do not account for the differences observed in expressed isoforms when alternative RNA splicing is analyzed. Alternative RNA splicing can generate dozens of distinct transcripts from individual genes and the expressions of ...


Analysis Of Covariance (Ancova) In Randomized Trials: More Precision, Less Conditional Bias, And Valid Confidence Intervals, Without Model Assumptions, Bingkai Wang, Elizabeth Ogburn, Michael Rosenblum 2018 Department of Biostatistics, Johns Hopkins University

Analysis Of Covariance (Ancova) In Randomized Trials: More Precision, Less Conditional Bias, And Valid Confidence Intervals, Without Model Assumptions, Bingkai Wang, Elizabeth Ogburn, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

Covariate adjustment" in the randomized trial context refers to an estimator of the average treatment effect that adjusts for chance imbalances between study arms in baseline variables (called “covariates"). The baseline variables could include, e.g., age, sex, disease severity, and biomarkers. According to two surveys of clinical trial reports, there is confusion about the statistical properties of covariate adjustment. We focus on the ANCOVA estimator, which involves fitting a linear model for the outcome given the treatment arm and baseline variables, and trials with equal probability of assignment to treatment and control. We prove the following new (to the ...


Statistical Tools For Assessment Of Spatial Properties Of Mutations Observed Under The Microarray Platform, Bin Luo 2018 The University of Western Ontario

Statistical Tools For Assessment Of Spatial Properties Of Mutations Observed Under The Microarray Platform, Bin Luo

Electronic Thesis and Dissertation Repository

Mutations are alterations of the DNA nucleotide sequence of the genome. Analyses of spatial properties of mutations are critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. The studies in this thesis focus on two types of mutations: point mutations, i.e., single nucleotide polymorphism (SNP) genotype differences, and mutations in segments, i.e., copy number variations (CNVs). The microarray platform, such as the Mouse Diversity Genotyping Array (MDGA), detects these mutations genome-wide with lower cost compared to whole genome sequencing, and thus is considered for suitability as a screening tool for large populations. Yet it ...


Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang 2018 University of Cambridge

Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba 2018 Valparaiso University

A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba

Tiffany N Kolba

The phenomenon of twins, triplets, quadruplets, and other higher order multiples has fascinated humans for centuries and has even captured the attention of mathematicians who have sought to model the probabilities of multiple births. However, there has not been extensive research into the phenomenon of polyovulation, which is one of the biological mechanisms that produces multiple births. In this paper, I describe how my own experience becoming a mother to twins led me on a quest to better understand the scientific processes going on inside my own body and motivated me to conduct research on polyovulation frequencies. An overview of ...


Robust Inference For The Stepped Wedge Design, James P. Hughes, Patrick J. Heagerty, Fan Xia, Yuqi Ren 2018 University of Washington - Seattle Campus

Robust Inference For The Stepped Wedge Design, James P. Hughes, Patrick J. Heagerty, Fan Xia, Yuqi Ren

UW Biostatistics Working Paper Series

Based on a permutation argument, we derive a closed form expression for an estimate of the treatment effect, along with its standard error, in a stepped wedge design. We show that these estimates are robust to misspecification of both the mean and covariance structure of the underlying data-generating mechanism, thereby providing a robust approach to inference for the treatment effect in stepped wedge designs. We use simulations to evaluate the type I error and power of the proposed estimate and to compare the performance of the proposed estimate to the optimal estimate when the correct model specification is known. The ...


Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal 2018 University of Louisville

Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal

Electronic Theses and Dissertations

This dissertation consists of three projects and can be categorized in two broad research areas: generalized spatiotemporal modeling and causal inference based on observational data. In the first project, I introduce a Bayesian hierarchical mixed effect hurdle model with a nested random effect structure to model the count for primary care providers and understand their spatial and temporal variation. This study further enables us to identify the health professional shortage areas and the possible impacting factors. In the second project, I have unified popular parametric and nonparametric propensity score-based methods to assess the treatment effect of multiple groups for ordinal ...


Marginal False Discovery Rate Approaches To Inference On Penalized Regression Models, Ryan Miller 2018 University of Iowa

Marginal False Discovery Rate Approaches To Inference On Penalized Regression Models, Ryan Miller

Theses and Dissertations

Data containing large number of variables is becoming increasingly more common and sparsity inducing penalized regression methods, such the lasso, have become a popular analysis tool for these datasets due to their ability to naturally perform variable selection. However, quantifying the importance of the variables selected by these models is a difficult task. These difficulties are compounded by the tendency for the most predictive models, for example those which were chosen using procedures like cross-validation, to include substantial amounts of noise variables with no real relationship with the outcome. To address the task of performing inference on penalized regression models ...


Prediction Of Preterm Birth With And Without Preeclampsia Using Mid-Pregnancy Immune And Growth-Related Molecular Factors And Maternal Characteristics, Laura L. Jelliffe-Pawlowski, Larry Rand, Bruce Bedell, Rebecca J. Baer, Scott P. Oltman, Mary E. Norton, Gary M. Shaw, David K. Stevenson, Jeffrey C. Murray, Kelli K. Ryckman 2018 University of Iowa

Prediction Of Preterm Birth With And Without Preeclampsia Using Mid-Pregnancy Immune And Growth-Related Molecular Factors And Maternal Characteristics, Laura L. Jelliffe-Pawlowski, Larry Rand, Bruce Bedell, Rebecca J. Baer, Scott P. Oltman, Mary E. Norton, Gary M. Shaw, David K. Stevenson, Jeffrey C. Murray, Kelli K. Ryckman

Stead Family Department of Pediatrics Publications

OBJECTIVE: To evaluate if mid-pregnancy immune and growth-related molecular factors predict preterm birth (PTB) with and without (±) preeclampsia.

STUDY DESIGN: Included were 400 women with singleton deliveries in California in 2009-2010 (200 PTB and 200 term) divided into training and testing samples at a 2:1 ratio. Sixty-three markers were tested in 15-20 serum samples using multiplex technology. Linear discriminate analysis was used to create a discriminate function. Model performance was assessed using area under the receiver operating characteristic curve (AUC).

RESULTS: Twenty-five serum biomarkers along with maternal age80% of women with PTB ± preeclampsia with best performance in women with ...


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor 2018 University of Louisville

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics ...


A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba 2018 Valparaiso University

A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba

Journal of Humanistic Mathematics

The phenomenon of twins, triplets, quadruplets, and other higher order multiples has fascinated humans for centuries and has even captured the attention of mathematicians who have sought to model the probabilities of multiple births. However, there has not been extensive research into the phenomenon of polyovulation, which is one of the biological mechanisms that produces multiple births. In this paper, I describe how my own experience becoming a mother to twins led me on a quest to better understand the scientific processes going on inside my own body and motivated me to conduct research on polyovulation frequencies. An overview of ...


Digital Commons powered by bepress