Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 23 of 23

Full-Text Articles in Physical Sciences and Mathematics

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong Dec 2022

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi Oct 2021

The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi

Senior Theses

Basket neuronal cells of the mammalian neocortex have been classically categorized into two or more groups. Originally, it was thought that the large and small types are the naturally occurring groups that emerge from reasons that relate to neurobiological function and anatomical position. Later, a study based on anatomical and physiological features of these neurons introduced a third type, the net basket cell which is intermediate in size as compared to the large and small types. In this study, multivariate analysis was used to test the hypothesis that the large and small types are morphologically distinct groups. The results of …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen Jul 2020

Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen

Statistical Science Theses and Dissertations

Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …


Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen May 2020

Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen

Statistical Science Theses and Dissertations

In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.


Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang Dec 2019

Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang

Statistical Science Theses and Dissertations

This dissertation contains two topics: (1) A Comparative Study of Statistical Methods for Quantifying and Testing Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events; (2) Estimation of Variances in Cluster Randomized Designs Using Ranked Set Sampling.

Meta-analysis, the statistical procedure for combining results from multiple studies, has been widely used in medical research to evaluate intervention efficacy and safety. In many practical situations, the variation of treatment effects among the collected studies, often measured by the heterogeneity parameter, may exist and can greatly affect the inference about effect sizes. Comparative studies have been done for only one or …


Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li Aug 2019

Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li

Statistical Science Theses and Dissertations

In this thesis, we investigate sample size calculation for three kinds of clinical trials: (1). Randomized controlled trials (RCTs) with longitudinal count outcomes; (2). Cluster randomized trials (CRTs) with count outcomes; (3). CRTs with multiple binary co-primary endpoints.


Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin Aug 2019

Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin

Graduate Theses and Dissertations

Tree ring chronology data is known to reflect regional climate due to the strong impact of rainfall and temperature. Therefore, tree ring data can be used to reconstruct historical climate in order to understand how climate changed in the past and make prediction about the future behavior of the climate. For simplicity, this research only considers the influence of precipitation on tree ring growth within the New England area. A total of 94 measurement sites are used to record tree ring width over 881 years and corresponding precipitation data are given at some locations for 121 years. We developed a …


Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore Jan 2019

Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore

CMC Senior Theses

The continued study of asymptomatic Ebolavirus infection is necessary to develop a more complete understanding of Ebola transmission dynamics. This paper conducts a meta-analysis of eight studies that measure seroprevalence (the number of subjects that test positive for anti-Ebolavirus antibodies in their blood) in subjects with household exposure or known case-contact with Ebola, but that have shown no symptoms. In our two random effects Bayesian hierarchical models, we find estimated seroprevalences of 8.76% and 9.72%, significantly higher than the 3.3% found by a previous meta-analysis of these eight studies. We also produce a variation of this meta-analysis where we exclude …


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Association Tests For Genetic Effect And Its Interaction With Environmental Factors, Zhengyang Zhou Jul 2018

Association Tests For Genetic Effect And Its Interaction With Environmental Factors, Zhengyang Zhou

Statistical Science Theses and Dissertations

My research is in the area of statistical genetics, and it contains three projects: (1) Differentiating the Cochran-Armitage (CA) trend test and Pearson’s chi-square test: location and dispersion; (2) Decomposing Pearson’s chi-square test: a linear regression and its departure from linearity; (3) Testing nonlinear gene-environment (GxE) interaction through varying coefficient and linear mixed models.

(1) In genetic case-control association studies, a standard practice is to perform the CA trend test with 1 degree-of-freedom (df) under the assumption of an additive model. However, when the true genetic model is recessive or near recessive, it is outperformed by Pearson’s chi-square test with …


Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia Apr 2018

Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia

Statistical Science Theses and Dissertations

This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.

Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level …


On The Quantification Of Complexity And Diversity From Phenotypes To Ecosystems, Zachary Harrison Marion Dec 2016

On The Quantification Of Complexity And Diversity From Phenotypes To Ecosystems, Zachary Harrison Marion

Doctoral Dissertations

A cornerstone of ecology and evolution is comparing and explaining the complexity of natural systems, be they genomes, phenotypes, communities, or entire ecosystems. These comparisons and explanations then beget questions about how complexity should be quantified in theory and estimated in practice. Here I embrace diversity partitioning using Hill or effective numbers to move the empirical side of the field regarding the quantification of biological complexity.

First, at the level of phenotypes, I show that traditional multivariate analyses ignore individual complexity and provide relatively abstract representations of variation among individuals. I then suggest using well-known diversity indices from community ecology …


Lead Poisoning In United States Children, Zeren Zhou May 2016

Lead Poisoning In United States Children, Zeren Zhou

Arts & Sciences Electronic Theses and Dissertations

We investigate factors related to blood lead levels of children ages 1 to 5 in the United States for the years 2007-2014. We use data from the National Health and Nutrition Examination Survey (NHANES). The goal is to explore predictors of lead in childrens' blood and to develop a multivariate model using as many predictors as possible. The analysis is conducted using SAS survey regression procedures that account for weighting, stratification, and clustering of the data.


Estimating Prevalence From Complex Surveys, Sophie O'Brien Nov 2014

Estimating Prevalence From Complex Surveys, Sophie O'Brien

Masters Theses

Massachusetts passed legislation in the fall of 2012 to allow the construction of three casinos and a slot parlor in the state. The prevalence of problem gambling in the state and in areas where casinos will be constructed is of particular interest. The goal is to evaluate the change in prevalence after construction of the casinos, using a multi-mode address based sample survey. The objective of this thesis is to evaluate and describe ways of using statistical inference to estimates prevalence rates in finite populations. Four methods were considered in an attempt to evaluate the prevalence of problem gambling in …


An Analysis Of Risk Reduction Choices In Dcis Breast Cancer Patients, Lauren Soltesz Dec 2012

An Analysis Of Risk Reduction Choices In Dcis Breast Cancer Patients, Lauren Soltesz

Statistics

The main focus of this paper was to evaluate possible demographic and clinical characteristics associated with a woman’s choice of breast conserving surgery (BCS), unilateral mastectomy (ULM), or bilateral risk reduction mastectomy (BRRM). The cohort consisted of patients presenting to the City of Hope National Medical Center with ductal carcinoma in situ breast cancer who elected to have cancer directed surgery (N=305). Analyses to examine associations of patient characteristics with type of surgery were conducted using a multinomial logistic regression. Results showed that older women were more likely to choose breast conserving surgery over bilateral risk reduction mastectomy than younger …


Adaptive Randomization Designs, Jenna Colavincenzo Jun 2012

Adaptive Randomization Designs, Jenna Colavincenzo

Statistics

Adaptive design methodologies use prior information to develop a clinical trial design. The goal of an adaptive design is to maintain the integrity and validity of the study while giving the researcher flexibility in identifying the optimal treatment. An example of an adaptive design can be seen in a basic pharmaceutical trial. There are three phases of the overall trial to compare treatments and experimenters use the information from the previous phase to make changes to the subsequent phase before it begins.

Adaptive design methods have been in practice since the 1970s, but have become increasingly complex ever since. One …


Geographic Disparities Associated With Stroke And Myocardial Infarction In East Tennessee, Ashley Pedigo Golden Dec 2011

Geographic Disparities Associated With Stroke And Myocardial Infarction In East Tennessee, Ashley Pedigo Golden

Doctoral Dissertations

Stroke and myocardial infarction (MI) are serious conditions whose burdens vary by socio-demographic and geographic factors. Although several studies have investigated and identified disparities in burdens of these conditions at the county and state levels, little is known regarding their geographic epidemiology at the neighborhood level. Both conditions require emergency treatments and therefore timely geographic accessibility to appropriate care is critical. Investigation of disparities in geographic accessibility to stroke and MI care and the role of Emergency Medical Services (EMS) in reducing treatment delays are vital in improving health outcomes. Therefore, the objectives of this work were to: (i) classify …