Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 28 of 28

Full-Text Articles in Physical Sciences and Mathematics

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross Aug 2023

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Applications For Functional Data Analysis, Kacy D. Kane Jan 2023

Applications For Functional Data Analysis, Kacy D. Kane

Graduate Research Theses & Dissertations

Functional Data Analysis is often used in the study of data that exists over a continuum, such as time. There are two datasets that will be considered here. For the first study we have a dataset on the efficacy of a lobectomy in reduction or elimination of epileptic seizures in patients. After an initial analysis of the dataset from a multinomial model perspective, we found that there were outliers in our dataset. From there, we considered a Multinomial Mixture Model to aid in the detection of outliers. In our second dataset we are considering a social robotics dataset where the …


Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong Dec 2022

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Estimation Of Causal Effects In Complex Clustered Data, Joshua R. Nugent Oct 2022

Estimation Of Causal Effects In Complex Clustered Data, Joshua R. Nugent

Doctoral Dissertations

Analysis of clustered data from randomized trials or observational data often poses theoretical and practical statistical challenges, including but not limited to small numbers of independent units, many adjustment variables, continuous exposures, and/or differential clustering across trial arms. Further, commonly-used parametric methods rely on assumptions that may be violated in practice. Motivated by three scientific questions in public health, methods are developed and/or demonstrated for non-parametric estimation of causal effects. In Chapter 1, methods are elaborated for a cluster randomized trial (CRT) with missing individual-level data at baseline and follow-up, a complex sampling strategy, and limited number of clusters. Chapter …


A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill Jan 2021

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

Honors Projects

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model …


Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen Jul 2020

Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen

Statistical Science Theses and Dissertations

Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …


Predictors Of Colorectal Cancer Screening Among Maryland Adults, Aged 50–75 Years, Pamela Manwi Asangong Jan 2020

Predictors Of Colorectal Cancer Screening Among Maryland Adults, Aged 50–75 Years, Pamela Manwi Asangong

Walden Dissertations and Doctoral Studies

Screening plays an essential role in reducing colorectal cancer (CRC) incidence and mortality rates, yet CRC screening use remains low in Maryland and lower in some age and racial/ethnic groups with limited resources to participate in CRC screening programs. The purpose of this quantitative, cross-sectional study is to investigate whether age group, sex, race/ethnicity, education level, income level, health insurance coverage, and access to health care professional can predict an individual, 50–75 years of age, in Maryland to take action to fully meet the United States Preventive Services Task Force CRC screening test recommendation within the recommended time interval. The …


Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang Dec 2019

Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang

Statistical Science Theses and Dissertations

This dissertation contains two topics: (1) A Comparative Study of Statistical Methods for Quantifying and Testing Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events; (2) Estimation of Variances in Cluster Randomized Designs Using Ranked Set Sampling.

Meta-analysis, the statistical procedure for combining results from multiple studies, has been widely used in medical research to evaluate intervention efficacy and safety. In many practical situations, the variation of treatment effects among the collected studies, often measured by the heterogeneity parameter, may exist and can greatly affect the inference about effect sizes. Comparative studies have been done for only one or …


Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li Aug 2019

Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li

Statistical Science Theses and Dissertations

In this thesis, we investigate sample size calculation for three kinds of clinical trials: (1). Randomized controlled trials (RCTs) with longitudinal count outcomes; (2). Cluster randomized trials (CRTs) with count outcomes; (3). CRTs with multiple binary co-primary endpoints.


Factors Associated With Eosinophilic Esophagitis In Nevada, Julia Lorraine Anderson Aug 2019

Factors Associated With Eosinophilic Esophagitis In Nevada, Julia Lorraine Anderson

UNLV Theses, Dissertations, Professional Papers, and Capstones

Eosinophilic esophagitis (EoE) is a rare immune-mediated illness with symptoms that range from difficulty swallowing to food impaction of the esophagus. Most published studies have been documented among patients residing in cool regions with significant annual rainfall. No published studies to our knowledge have been performed examining the healthcare utilization trends of EoE in Nevada. Utilizing two unique databases, the factors associated with EoE healthcare utilization patterns in Nevada were examined. All analyses were performed in R version 3.5.1. This study included a demographic and regional analysis identifying risk factors associated with having an EoE healthcare visit in Nevada. Several …


Methods For Making Policy-Relevant Forecasts Of Infectious Disease Incidence, Stephen A. Lauer Jul 2019

Methods For Making Policy-Relevant Forecasts Of Infectious Disease Incidence, Stephen A. Lauer

Doctoral Dissertations

Infectious diseases place an enormous burden on the people of the developing world and their governments. When, where, and how to allocate resources in order to slow the spread of a virus or deal with the aftermath of an outbreak is often the responsibility of local public health officials. In this thesis, we develop statistical methods for forecasting future incidence of infectious diseases and estimating the effects of interventions designed to reduce future incidence, bearing in mind the needs and concerns of those public health officials. While most infectious disease forecasting models focus on short-term horizons (i.e. weeks or …


Robust And Adaptive Design Approaches For Stepped Wedge Cluster Randomized Trials, Jijia Wang Jan 2019

Robust And Adaptive Design Approaches For Stepped Wedge Cluster Randomized Trials, Jijia Wang

Statistical Science Theses and Dissertations

The stepped wedge (SW) cluster randomized design has been increasingly employed by pragmatic trials in health services research. In this study, based on the GEE approach, I present a closed-form sample size that is applicable to both closed-cohort and cross-sectional SW trials with outcomes from the exponential family. On the other hand, I proposed a Bayesian adaptive design for cross-sectional SW cluster randomized trials. It is more adaptable than traditional designs because it allows early termination of the trial when interim data indicate that the intervention is sufficient efficacious or inefficacious. A decision to terminate or continue the trial will …


Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell Jan 2019

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell

Theses and Dissertations

The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for …


Angiostrongylus Cantonensis: Epidemiologic Review, Location-Specific Habitat Modelling, And Surveillance In Hillsborough County, Florida, U.S.A., Brad Christian Perich Mar 2018

Angiostrongylus Cantonensis: Epidemiologic Review, Location-Specific Habitat Modelling, And Surveillance In Hillsborough County, Florida, U.S.A., Brad Christian Perich

USF Tampa Graduate Theses and Dissertations

Angiostrongylus cantonensis is a parasitic nematode endemic to tropical and subtropical regions and is the leading cause of human eosinophilic meningitis. The parasite is commonly known as rat lungworm because the primary host in its lifecycle is the rat. A clinical overview of rat lungworm infection is presented, followed by a literature review of rat lungworm epidemiology, risk factors, and surveillance projects. Data collected from previous snail surveys in Florida was considered alongside elevation, population per square kilometer, median household income by zip code territory, and normalized difference vegetation index specific to the geographic coordinates from which the snail samples …


Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah May 2017

Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah

Electronic Theses and Dissertations

Despite considerable advances in high throughput technology over the last decade, new challenges have emerged related to the analysis, interpretation, and integration of high-dimensional data. The arrival of omics datasets has contributed to the rapid improvement of systems biology, which seeks the understanding of complex biological systems. Metabolomics is an emerging omics field, where mass spectrometry technologies generate high dimensional datasets. As advances in this area are progressing, the need for better analysis methods to provide correct and adequate results are required. While in other omics sectors such as genomics or proteomics there has and continues to be critical understanding …


An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert Jan 2017

An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert

Theses and Dissertations--Epidemiology and Biostatistics

It is estimated that Periodontal Diseases effects up to 90% of the adult population. Given the complexity of the host environment, many factors contribute to expression of the disease. Age, Gender, Socioeconomic Status, Smoking Status, and Race/Ethnicity are all known risk factors, as well as a handful of known comorbidities. Certain vitamins and minerals have been shown to be protective for the disease, while some toxins and chemicals have been associated with an increased prevalence. The role of toxins, chemicals, vitamins, and minerals in relation to disease is believed to be complex and potentially modified by known risk factors. A …


A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten Jan 2016

A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten

Theses and Dissertations

Streptococcus sanguinis is a gram-positive, non-motile bacterium native to human mouths. It is the primary cause of endocarditis and is also responsible for tooth decay. Two-component systems (TCSs) are commonly found in bacteria. In response to environmental signals, TCSs may regulate the expression of virulence factor genes.

Gene co-expression networks are exploratory tools used to analyze system-level gene functionality. A gene co-expression network consists of gene expression profiles represented as nodes and gene connections, which occur if two genes are significantly co-expressed. An adjacency function transforms the similarity matrix containing co-expression similarities into the adjacency matrix containing connection strengths. Gene …


Developing A Weibull Model Extension To Estimate Cancer Latency Times, Diana L. Nadler Jan 2015

Developing A Weibull Model Extension To Estimate Cancer Latency Times, Diana L. Nadler

Legacy Theses & Dissertations (2009 - 2024)

More than one-third of all Americans will be diagnosed with cancer sometime in their lives. Though their illness may be invisible now, it presents a great, and largely unexamined, opportunity to find and treat their cancers early. Early detection represents one of the most promising approaches to reduce the growing cancer burden by identifying cancer while it is localized and curable, preventing not only mortality, but also reducing morbidity and costs.


Estimating Prevalence From Complex Surveys, Sophie O'Brien Nov 2014

Estimating Prevalence From Complex Surveys, Sophie O'Brien

Masters Theses

Massachusetts passed legislation in the fall of 2012 to allow the construction of three casinos and a slot parlor in the state. The prevalence of problem gambling in the state and in areas where casinos will be constructed is of particular interest. The goal is to evaluate the change in prevalence after construction of the casinos, using a multi-mode address based sample survey. The objective of this thesis is to evaluate and describe ways of using statistical inference to estimates prevalence rates in finite populations. Four methods were considered in an attempt to evaluate the prevalence of problem gambling in …


Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese Apr 2013

Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese

Theses and Dissertations

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal components analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of principal components analysis to quantify the existence of batch effects, called guided PCA (gPCA). We describe a …


Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico Mar 2013

Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico

Theses and Dissertations

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) are unsuitable. We extend and characterize a weighted quantile score (WQS) approach to estimating an index for a set of highly correlated components. In the case with environmental chemicals, we use the WQS to identify “bad actors” and estimate body burden. The accuracy of the WQS was evaluated through extensive simulation studies in terms of validity (ability of the WQS …


Is Obesity Socially Contagious?, Ciani Jean Sparks Mar 2013

Is Obesity Socially Contagious?, Ciani Jean Sparks

Statistics

The main objective of this paper is to analyze three different articles that discuss whether obesity could be socially contagious. According to the World Health Organization in 2013, obesity is the fifth leading risk for deaths around the world. This disease has dramatically increased in the last decade, which has led scientists to believe there are other factors contributing to the epidemic besides genetics. The first article I analyzed, written by Nicholas Christakis and James Fowler, provided a logistic regression model to estimate the odds of a person becoming obese. The model included the explanatory variables: age, sex, education, smoking …


Investigation Of A Pregnancy Lifestyle Intervention Using Mediation Analysis And A Power Analysis Simulation, Kelsey Grantham Jan 2013

Investigation Of A Pregnancy Lifestyle Intervention Using Mediation Analysis And A Power Analysis Simulation, Kelsey Grantham

Statistics

No abstract provided.


Advanced Methodology Developments In Mixture Cure Models, Chao Cai Jan 2013

Advanced Methodology Developments In Mixture Cure Models, Chao Cai

Theses and Dissertations

Modern medical treatments have substantially improved cure rates for many chronic diseases and have generated increasing interest in appropriate statistical models to handle survival data with non-negligible cure fractions. The mixture cure models are designed to model such data set, which assume that studied population is a mixture of being cured and uncured. In this dissertation, I will develop two programs named smcure and NPHMC in R. The first program aims to facilitate estimating two popular mixture cure models: the proportional hazards (PH) mixture cure model and accelerated failure time (AFT) mixture cure model. The second program focuses on designing …


Models And Software Development For Interval-Censored Data, Chun Pan Jan 2013

Models And Software Development For Interval-Censored Data, Chun Pan

Theses and Dissertations

Interval-censored time-to-event data occur naturally in studies of diseases where the symptoms are not directly observable, and periodic clinical examinations are required for detection. Due to the lack of well-established procedures, interval-censored data have been conventionally treated as right-censored data, however, this introduces bias at the first place. This dissertation focuses on methodological research and software development for interval-censored data. Specifically, it consists of three projects. The first project is to create an R package for regression analysis and survival curve estimation of interval-censored data based on several published papers by our research team. In the second project, a Bayesian …


Normal Mixture Models For Gene Cluster Identification In Two Dimensional Microarray Data, Eric Scott Harvey Jan 2003

Normal Mixture Models For Gene Cluster Identification In Two Dimensional Microarray Data, Eric Scott Harvey

Theses and Dissertations

This dissertation focuses on methodology specific to microarray data analyses that organize the data in preliminary steps and proposes a cluster analysis method which improves the interpretability of the cluster results. Cluster analysis of microarray data allows samples with similar gene expression values to be discovered and may serve as a useful diagnostic tool. Since microarray data is inherently noisy, data preprocessing steps including smoothing and filtering are discussed. Comparing the results of different clustering methods is complicated by the arbitrariness of the cluster labels. Methods for re-labeling clusters to assess the agreement between the results of different clustering techniques …