Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Virginia Commonwealth University

Biostatistics

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell Jan 2019

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell

Theses and Dissertations

The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for …


A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten Jan 2016

A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten

Theses and Dissertations

Streptococcus sanguinis is a gram-positive, non-motile bacterium native to human mouths. It is the primary cause of endocarditis and is also responsible for tooth decay. Two-component systems (TCSs) are commonly found in bacteria. In response to environmental signals, TCSs may regulate the expression of virulence factor genes.

Gene co-expression networks are exploratory tools used to analyze system-level gene functionality. A gene co-expression network consists of gene expression profiles represented as nodes and gene connections, which occur if two genes are significantly co-expressed. An adjacency function transforms the similarity matrix containing co-expression similarities into the adjacency matrix containing connection strengths. Gene …


Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese Apr 2013

Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese

Theses and Dissertations

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal components analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of principal components analysis to quantify the existence of batch effects, called guided PCA (gPCA). We describe a …


Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico Mar 2013

Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico

Theses and Dissertations

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) are unsuitable. We extend and characterize a weighted quantile score (WQS) approach to estimating an index for a set of highly correlated components. In the case with environmental chemicals, we use the WQS to identify “bad actors” and estimate body burden. The accuracy of the WQS was evaluated through extensive simulation studies in terms of validity (ability of the WQS …


Hypothesis Testing And Power Calculations For Taxonomic-Based Human Microbiome Data, P. S. Larossa, J. Paul Brooks, Elena Deych, Edward L. Boone, David J. Edwards, Qin Wang, Erica Sodergren, George Weinstock, William D. Shannon Jan 2012

Hypothesis Testing And Power Calculations For Taxonomic-Based Human Microbiome Data, P. S. Larossa, J. Paul Brooks, Elena Deych, Edward L. Boone, David J. Edwards, Qin Wang, Erica Sodergren, George Weinstock, William D. Shannon

Statistical Sciences and Operations Research Publications

This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of …


Normal Mixture Models For Gene Cluster Identification In Two Dimensional Microarray Data, Eric Scott Harvey Jan 2003

Normal Mixture Models For Gene Cluster Identification In Two Dimensional Microarray Data, Eric Scott Harvey

Theses and Dissertations

This dissertation focuses on methodology specific to microarray data analyses that organize the data in preliminary steps and proposes a cluster analysis method which improves the interpretability of the cluster results. Cluster analysis of microarray data allows samples with similar gene expression values to be discovered and may serve as a useful diagnostic tool. Since microarray data is inherently noisy, data preprocessing steps including smoothing and filtering are discussed. Comparing the results of different clustering methods is complicated by the arbitrariness of the cluster labels. Methods for re-labeling clusters to assess the agreement between the results of different clustering techniques …