Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Theses and Dissertations

Biostatistics

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell Jan 2019

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell

Theses and Dissertations

The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for …


A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten Jan 2016

A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten

Theses and Dissertations

Streptococcus sanguinis is a gram-positive, non-motile bacterium native to human mouths. It is the primary cause of endocarditis and is also responsible for tooth decay. Two-component systems (TCSs) are commonly found in bacteria. In response to environmental signals, TCSs may regulate the expression of virulence factor genes.

Gene co-expression networks are exploratory tools used to analyze system-level gene functionality. A gene co-expression network consists of gene expression profiles represented as nodes and gene connections, which occur if two genes are significantly co-expressed. An adjacency function transforms the similarity matrix containing co-expression similarities into the adjacency matrix containing connection strengths. Gene …


Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese Apr 2013

Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese

Theses and Dissertations

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal components analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of principal components analysis to quantify the existence of batch effects, called guided PCA (gPCA). We describe a …


Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico Mar 2013

Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico

Theses and Dissertations

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) are unsuitable. We extend and characterize a weighted quantile score (WQS) approach to estimating an index for a set of highly correlated components. In the case with environmental chemicals, we use the WQS to identify “bad actors” and estimate body burden. The accuracy of the WQS was evaluated through extensive simulation studies in terms of validity (ability of the WQS …


Models And Software Development For Interval-Censored Data, Chun Pan Jan 2013

Models And Software Development For Interval-Censored Data, Chun Pan

Theses and Dissertations

Interval-censored time-to-event data occur naturally in studies of diseases where the symptoms are not directly observable, and periodic clinical examinations are required for detection. Due to the lack of well-established procedures, interval-censored data have been conventionally treated as right-censored data, however, this introduces bias at the first place. This dissertation focuses on methodological research and software development for interval-censored data. Specifically, it consists of three projects. The first project is to create an R package for regression analysis and survival curve estimation of interval-censored data based on several published papers by our research team. In the second project, a Bayesian …


Advanced Methodology Developments In Mixture Cure Models, Chao Cai Jan 2013

Advanced Methodology Developments In Mixture Cure Models, Chao Cai

Theses and Dissertations

Modern medical treatments have substantially improved cure rates for many chronic diseases and have generated increasing interest in appropriate statistical models to handle survival data with non-negligible cure fractions. The mixture cure models are designed to model such data set, which assume that studied population is a mixture of being cured and uncured. In this dissertation, I will develop two programs named smcure and NPHMC in R. The first program aims to facilitate estimating two popular mixture cure models: the proportional hazards (PH) mixture cure model and accelerated failure time (AFT) mixture cure model. The second program focuses on designing …


Normal Mixture Models For Gene Cluster Identification In Two Dimensional Microarray Data, Eric Scott Harvey Jan 2003

Normal Mixture Models For Gene Cluster Identification In Two Dimensional Microarray Data, Eric Scott Harvey

Theses and Dissertations

This dissertation focuses on methodology specific to microarray data analyses that organize the data in preliminary steps and proposes a cluster analysis method which improves the interpretability of the cluster results. Cluster analysis of microarray data allows samples with similar gene expression values to be discovered and may serve as a useful diagnostic tool. Since microarray data is inherently noisy, data preprocessing steps including smoothing and filtering are discussed. Comparing the results of different clustering methods is complicated by the arbitrariness of the cluster labels. Methods for re-labeling clusters to assess the agreement between the results of different clustering techniques …