Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics and Probability

Theses and Dissertations

Articles 1 - 30 of 500

Full-Text Articles in Physical Sciences and Mathematics

The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin Dec 2023

The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin

Theses and Dissertations

This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …


Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop Jul 2023

Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop

Theses and Dissertations

This dissertation focuses on theory and application of discrete data methods, particularly approaches to over- and underdispersion relative to the Poisson distribution and an application of random forest and logistic regression modeling. The first chapter derives a score test for over- and underdispersion in the heaped generalized Poisson distribution. Equi-, over-, and underdispersed heaped generalized Poisson and heaped negative binomial data are simulated to evaluate the performance of the score test by comparing the power it achieves to that of Wald and likelihood ratio tests. We find that the score test we derive performs comparably to both the Wald and …


A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni Jul 2023

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni

Theses and Dissertations

Scan statistics are useful methods for detecting spatial clustering. While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data. They have many applications in different fields such as epidemiology (e.g. detecting disease outbreaks), sociology (e.g. detecting crime hotspots), and environmental health (e.g. detecting high-pollution areas). Spatial scan statistics identify a ‘most likely cluster’ and then use a likelihood ratio test to determine if this cluster is statistically significant. Spatial scan statistics have been extended to the Bayesian …


Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin Jul 2023

Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin

Theses and Dissertations

The recent emergence of single cell sequencing (SCS) technology has provided us with single-cell DNA or RNA sequencing (scDNA/RNA-seq) information to investigate cellular evolutionary relationships. Despite many analysis methods have been developed to infer intra-tumor genetic heterogeneity, cluster cellular subclones, detect genetic mutations, and investigate spatially variable (SV) genes, exploring SCS data remains statistically challenging due to its noisy nature.

To identify subclones with scDNA-seq data, many existing studies use an independent statistical model to detect copy number profile in the first step, followed by classical clustering methods for subclone identification in downstream analyses. However, spurious results might be generated …


Explorations In Baseball Analytics: Simulations, Predictions, And Evaluations For Games And Players, Katelyn Mongerson May 2023

Explorations In Baseball Analytics: Simulations, Predictions, And Evaluations For Games And Players, Katelyn Mongerson

Theses and Dissertations

From statistics being reported in newspapers in the 1840s, to present day, baseballhas always been one of the most data-driven sports. We make use of the endless publicly available baseball data to build models in R and Python that answer various baseball- related questions regarding predicting and optimizing run production, evaluating player effectiveness, and forecasting the postseason. To predict and optimize run production, we present three models. The first builds a common tool in baseball analysis called a Run Expectancy Matrix which is used to give a value (in terms of runs) to various in-game decisions. The second uses the …


Change Point Detection For A Process Having Several Regimes, Oliver Gerd Meister May 2023

Change Point Detection For A Process Having Several Regimes, Oliver Gerd Meister

Theses and Dissertations

In this dissertation, possible methods for multiple change point detection on Markovchain processes are studied. Related works for oine and online change point detection are discussed and their applicability on sequential multiple change point detection for several regimes is evaluated. We develop a method for a multiple change point detection for a process having three regimes. Its eciency is then evaluated on simulated Markov chain data by looking into dierent scenarios such as processes that signicantly dier between each other or probability distributions that are slightly similar. This approach is then applied on Covid- 19 hospital data. Therefore, the data …


A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun May 2023

A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun

Theses and Dissertations

Chronic kidney disease (CKD) is a significant complication that contributes to diabetes-related mortality in the United States, and there is growing evidence that sodium-glucose cotransporter 2 inhibitors (SGLT2i) can slow its progression. However, observational studies may suffer from confounding by indication, where patient characteristics and disease severity influence the decision to prescribe SGLT2i. This study utilized electronic health records of individuals with diabetes (from TriNetX) to investigate the effectiveness of SGLT2i on CKD progression. The database provided detailed information on patients’ CKD status, demographics, diagnosis, procedures, and medications, along with corresponding dates of diagnosis and prescription. The study comprised of …


A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas May 2023

A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas

Theses and Dissertations

Obesity is the accumulation of an abnormal, or excessive, amount of fat in the body, which can have negative effects on overall health. This excess accumulation of macronutrients in adipose tissue can cause the release of inflammatory mediators, leading to a proinflammatory state. Inflammation is a known risk factor for various health conditions, including cardiovascular diseases, metabolic syndrome, and diabetes. This study sought to examine the use of data mining methods, particularly clustering algorithms, to identify inflammatory biomarker phenotypes and their association with obesity in a local adolescent population. The algorithms evaluated in this study included: k-means, Ward's hierarchical …


Advancements In Parametric Modal Regression, Qingyang Liu Apr 2023

Advancements In Parametric Modal Regression, Qingyang Liu

Theses and Dissertations

This dissertation considers statistical inference methods for parametric modal regression models. In Chapter 1, we motivate the mode as the measure of central tendency instead of the median or the mean with an example. Following the motivational example, we include an overview of existing modal regression models. Later, in the same chapter, we explain advantages of the parametric modal regression models over existing nonparametric modal regression models. In Chapter 2, we address issues in statistical inference brought in by data contaminated with measurement error. With measurement error in covariates, statistical inference methods designed for modal regression models with error-free covariates …


Sparse Partitioned Empirical Bayes Ecm Algorithms For High-Dimensional Linear Mixed Effects And Heteroscedastic Regression, Anja Zgodic Apr 2023

Sparse Partitioned Empirical Bayes Ecm Algorithms For High-Dimensional Linear Mixed Effects And Heteroscedastic Regression, Anja Zgodic

Theses and Dissertations

Variable selection methods in both the frequentist and Bayesian frameworks are powerful techniques that provide prediction and inference in high-dimensional linear regression models. These methods often assume independence between observations and normally distributed errors with the same variance. In practice, these two assumptions are often violated. To mitigate this, we develop efficient and powerful Bayesian approaches for linear mixed modeling and heteroscedastic linear regression. These method offers increased flexibility through the development of empirical Bayes estimators for hyperparameters, with computationally efficient estimation through the Expectation Conditional-Minimization (ECM) algorithm. The novelty of these approaches lies in the partitioning and parameter expansion, …


Model-Based Imputation Of Below Detection Limit Missing Data And Group Selection In Bayesian Group Index Regression, Matthew Carli Jan 2023

Model-Based Imputation Of Below Detection Limit Missing Data And Group Selection In Bayesian Group Index Regression, Matthew Carli

Theses and Dissertations

Investigations into the association between chemical exposure and health outcomes are increasingly focused on the role of chemical mixtures, as opposed to individual chemicals. The analysis of chemical mixture data required the development of novel statistical methods, one of these being Bayesian group index regression. A statistical challenge common to all chemical mixture analyses is the ubiquitous presence of below detection limit (BDL) data. We propose an extension of Bayesian group index regression that treats both regression effects and missing BDL observations as parameters in a model estimated through a Markov Chain Monte Carlo algorithm that we refer to as …


Variability In Causal Effects On A Binary Outcome And Noncompliance In A Multisite Randomized Trial, Xinxin Sun Jan 2023

Variability In Causal Effects On A Binary Outcome And Noncompliance In A Multisite Randomized Trial, Xinxin Sun

Theses and Dissertations

Noncompliance to treatment assignment is widespread in randomized trials and presents challenges in causal inference. In the presence of noncompliance, the most commonly estimated effect of treatment assignment, also known as intent-to-treat (ITT) effect, is biased. Of interest in this setting is the complier average causal effect (CACE), the ITT effect among compliers. Further complication arises when the outcome variable is partially observed.

My research focuses on estimating the distribution of a site-specific CACE in a multisite randomized controlled trial (MRCT) by maximum likelihood (ML). Assuming compliance missing at random (MAR). We express the likelihood as an integral with respect …


Dynamics Of Redox-Driven Molecular Processes In Local And Systemic Plant Immunity, Philip Berg Dec 2022

Dynamics Of Redox-Driven Molecular Processes In Local And Systemic Plant Immunity, Philip Berg

Theses and Dissertations

The work here presents two main parts. In the first part, chapters 1 – 3 focus on dynamical systems modeling in plant immunity, whereas chapters 4 – 6 describe contributions to computational modeling and analysis of proteomics and genomics data. Chapter 1 investigates dynamical and biochemical patterns of reversibly oxidized cysteines (RevOxCys) during effector-triggered immunity (ETI) in Arabidopsis, examines the regulatory patterns associated with Arabidopsis thimet oligopeptidase 1 and 2’s (TOP1 and TOP2), roles in the RevOxCys events during ETI, and analyzes the redox phenotype of the top1top2 mutant. The second chapter investigates the peptidome dynamics during ETI …


Towards Structured Planning And Learning At The State Fisheries Agency Scale, Caleb A. Aldridge Dec 2022

Towards Structured Planning And Learning At The State Fisheries Agency Scale, Caleb A. Aldridge

Theses and Dissertations

Inland recreational fisheries has grown philosophically and scientifically to consider economic and sociopolitical aspects (non-biological) in addition to the biological. However, integrating biological and non-biological aspects of inland fisheries has been challenging. Thus, an opportunity exists to develop approaches and tools which operationalize planning and decision-making processes which include biological and non-biological aspects of a fishery. This dissertation expands the idea that a core set of goals and objectives is shared among and within inland fisheries agencies; that many routine operations of inland fisheries managers can be regimented or standardized; and the novel concept that current information and operations can …


Weather Parameters Influencing The Incidence Of Citrus Canker Caused By Aw Strain In The Rio Grande Valley, Amit Sharma Dec 2022

Weather Parameters Influencing The Incidence Of Citrus Canker Caused By Aw Strain In The Rio Grande Valley, Amit Sharma

Theses and Dissertations

Citrus canker caused by bacterium Xanthomonas citri subsp. citri (Xcc) seriously affects the citrus industry by making the fruit unmarketable due to unsightly lesions on the fruit. Canker caused by Aw strain of Xcc was reported in the citrus trees located in the residential areas of the Rio Grande Valley (RGV). Canker severity differs amongst cultivars/varieties, and it is influenced by prevailing environmental conditions. Multiple regression modeling of the disease incidence with the environmental variables such as temperature, humidity, windspeed, wind gust, and rainfall was performed to understand the environmental conditions that are favorable for spread of citrus …


Orthogonal Arrays And Legendre Pairs, Kristopher N. Kilpatrick Sep 2022

Orthogonal Arrays And Legendre Pairs, Kristopher N. Kilpatrick

Theses and Dissertations

Well-designed experiments greatly improve test and evaluation. Efficient experiments reduce the cost and time of running tests while improving the quality of the information obtained. Orthogonal Arrays (OAs) and Hadamard matrices are used as designed experiments to glean as much information as possible about a process with limited resources. However, constructing OAs and Hadamard matrices in general is a very difficult problem. Finding Legendre pairs (LPs) results in the construction of Hadamard matrices. This research studies the classification problem of OAs and the existence problem of LPs. In doing so, it makes two contributions to the discipline. First, it improves …


Statistical Inference On Desirability Function Optimal Points To Evaluate Multi-Objective Response Surfaces, Peter A. Calhoun Sep 2022

Statistical Inference On Desirability Function Optimal Points To Evaluate Multi-Objective Response Surfaces, Peter A. Calhoun

Theses and Dissertations

A shortfall of the Derringer and Suich (1980) desirability function is lack of inferential methods to quantify uncertainty. Most articles for addressing uncertainty usually involve robust methods, providing a point estimate that is less affected by variation. Few articles address confidence intervals or bands but not specifically for the Derringer and Suich method. This research provides two valuable contributions to the field of response surface methodology. The first contribution is evaluating the effect of correlation and plane angles on Derringer and Suich optimal solutions. The second contribution proposes and compares 8 inferential methods--both univariate and multivariate--for creating confidence intervals on …


Analytic Case Study Using Unsupervised Event Detection In Multivariate Time Series Data, Jeremy M. Wightman Sep 2022

Analytic Case Study Using Unsupervised Event Detection In Multivariate Time Series Data, Jeremy M. Wightman

Theses and Dissertations

Analysis of cyber-physical systems (CPS) has emerged as a critical domain for providing US Air Force and Space Force leadership decision advantage in air, space, and cyberspace. Legacy methods have been outpaced by evolving battlespaces and global peer-level challengers. Automation provides one way to decrease the time that analysis currently takes. This thesis presents an event detection automation system (EDAS) which utilizes deep learning models, distance metrics, and static thresholding to detect events. The EDAS automation is evaluated with case study of CPS domain experts in two parts. Part 1 uses the current methods for CPS analysis with a qualitative …


Defining Viable Solar Resource Locations In The Southeast United States Using The Satellite-Based Glass Product, Jolie Kavanagh Aug 2022

Defining Viable Solar Resource Locations In The Southeast United States Using The Satellite-Based Glass Product, Jolie Kavanagh

Theses and Dissertations

This research uses satellite data and the moment statistics to determine if solar farms can be placed in the Southeast US. From 2001-2019, the data are analyzed in reference to the Southwest US, where solar farms are located. The clean energy need is becoming more common; therefore, more locations than arid environments must be observed. The Southeast US is the main location of interest due to the warm, moist environment throughout the year. This research uses the Global Land Surface Satellite (GLASS) photosynthetically active radiation product (PAR) to determine viable locations for solar panels. A probability density function (PDF) along …


Abm Simulation Model Of A Pandemic For Optimizing Vaccination Strategy, Gibeom Park Aug 2022

Abm Simulation Model Of A Pandemic For Optimizing Vaccination Strategy, Gibeom Park

Theses and Dissertations

This study presents a process-oriented hybrid model for individuals' immune responses and interactions involving vaccination to describe the trend of contagious disease and estimate the future societal cost. The model considers "recovery" as a non-absorbing state and incorporates various infection stage states including two symptomatic states. To model contagiousness to be consistent with the current pandemic and include that the spread of a disease depends on the mobility of people, we developed an Agent-Based Simulator that fitted to the particular model used in this study and can test various what-if scenarios. We improved the simulator considerably by appying data structures …


Neural Networks And Stochastic Differential Equations, Stephanie L. Flores Aug 2022

Neural Networks And Stochastic Differential Equations, Stephanie L. Flores

Theses and Dissertations

Influenced by the seminal work, “Physics Informed Neural Networks” by Raissi et al., 2017, there has been a growing interest in solving and parameter estimation of Nonlinear Partial Differential Equations (PDE) with Deep Neural networks in recent years. In fact, this has broadened the pathways and shed light on deep learning of stochastic differential equations (SDE) and stochastic PDE’s (SPDE).In this work, we intend to investigate the current approaches of solving and parameter estimation of the SDE/SPDE with deep neural networks and the possibility of extending them to obtain more accurate/stable solutions with residual systems and/or generative adversarial neural networks. …


Effects Of Macronutrients Intake And Physical Activity On Childhood Obesity Of Hispanic Children, Prosanta Barai Aug 2022

Effects Of Macronutrients Intake And Physical Activity On Childhood Obesity Of Hispanic Children, Prosanta Barai

Theses and Dissertations

Obesity has become more ubiquitous during the past few decades, and still, its prevalence is increasing. It is in every population in the world and all regions, including rural parts of low and middle-income countries. In the USA, regardless of age, the severity of obesity is no different from the global trend. Although numerous pieces of literature are available, that tried to find answers to some pressing issues like how obesity can be controlled, but there is little to no study focused on younger children, especially the 4-6-year-old Hispanic population. Our study aimed to determine the causal path among literature …


Modified Em Algorithm In Smcure Package Based On Proportional Hazards Mixture Cure Model With Offset Terms, Jiaying Yi Jul 2022

Modified Em Algorithm In Smcure Package Based On Proportional Hazards Mixture Cure Model With Offset Terms, Jiaying Yi

Theses and Dissertations

Mixture cure model is a useful method of survival analysis for population including cured proportion and uncured proportion. The R package SMCURE applies EM algorithm to estimate the coefficients of covariates in the mixture cure model. Although an offset term is specified in the SMCURE statement, the offset term is not appropriately handled in the algorithm. This thesis aims to adjust the EM algorithm for the proportional hazards mixture cure model in the SMCURE package. In addition, the offset term can be specified separately in the incidence part or the latency part. The numerical experiments include simulation study and real …


Statistical Methods For Analyzing Dependence Structures With Applications In Single-Cell Experiments, Zhen Yang Jul 2022

Statistical Methods For Analyzing Dependence Structures With Applications In Single-Cell Experiments, Zhen Yang

Theses and Dissertations

This dissertation focuses on studying methods in dependence structure analysis. In particular, it consists of two topics: (1) modeling dynamic correlation in zero-inflated bivariate count data; and (2) gene co-expression latent factor analysis for cell-type clustering.

In Chapter 2, a zero-inflated negative binomial model for analyzing the dynamic correlation in zero-inflated bivariate count data is proposed. Interactions between biological molecules in a cell are tightly coordinated and often highly dynamic. As a result of these varying signaling activities, changes in gene co-expression patterns could often be observed. The advancements in next-generation sequencing tech-nologies bring new statistical challenges for studying these …


Statistical Methods For Analyzing Multi-Omics Data: Dependence Structure And Missing Values, Wenda Zhang Jul 2022

Statistical Methods For Analyzing Multi-Omics Data: Dependence Structure And Missing Values, Wenda Zhang

Theses and Dissertations

The advancements in high-throughput technologies have made it possible to generate a huge number of "omics'' data, including genomics, proteomics, transcriptomics, epigenomics, metabolomics, and microbiomics. Combining multiple data sources and performing joint analyses with all available information and the phenotypic outcome can reflect various aspects in complex biological systems, such as revealing regulation processes, discovering novel associations between biological entities, and identifying relevant biomarkers for certain diseases or phenotypic outcomes. This dissertation focuses on developing statistical models for analyzing multi-omics data. It is comprised of three topics: (1) integrative analysis for multi-omics data with missing observations in intermediate variables; (2) …


Evaluating A Statistical-Based Assessment Tool For Stratifying Risk Among U.S. Air Force Organizations, Tiffany A. Low Jun 2022

Evaluating A Statistical-Based Assessment Tool For Stratifying Risk Among U.S. Air Force Organizations, Tiffany A. Low

Theses and Dissertations

The Air Force Inspection System is a proponent of utilizing a risk-based sampling strategy (RBSS) for conducting inspections from major command levels down to the unit level. The strategy identifies areas deemed most important or risky by commanders and prioritizes them accordingly for an independent assessment by the Inspector General. While Air Force regulation specifies the need to use a RBSS for inspection, the implementation process is delegated to individual commands and, subsequently, wings. The 23rd Wing, the sponsor for this research, directed us to analyze a RBSS tool highlighted as an example from which to adopt for those units …


Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju May 2022

Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju

Theses and Dissertations

The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis …


Evaluating Soil Health Changes Following Cover Crop And No-Till Integration Into A Soybean (Glycine Max) Cropping System In The Mississippi Alluvial Valley, Alexandra Gwin Firth May 2022

Evaluating Soil Health Changes Following Cover Crop And No-Till Integration Into A Soybean (Glycine Max) Cropping System In The Mississippi Alluvial Valley, Alexandra Gwin Firth

Theses and Dissertations

The transition of natural landscapes to intensive agricultural uses has resulted in severe loss of soil organic carbon (SOC), increased CO₂ emissions, river depletion, and groundwater overdraft. Despite negative documented effects of agricultural land use (i.e., soil erosion, nutrient runoff) on critical natural resources (i.e., water, soil), food production must increase to meet the demands of a rising human population. Given the environmental and agricultural productivity concerns of intensely managed soils, it is critical to implement conservation practices that mitigate the negative effects of crop production and enhance environmental integrity. In the Mississippi Alluvial Valley (MAV) region of Mississippi, USA, …


Spline Modeling And Localized Mutual Information Monitoring Of Pairwise Associations In Animal Movement, Andrew Benjamin Whetten May 2022

Spline Modeling And Localized Mutual Information Monitoring Of Pairwise Associations In Animal Movement, Andrew Benjamin Whetten

Theses and Dissertations

to a new era of remote sensing and geospatial analysis. In environmental science and conservation ecology, biotelemetric data recorded is often high-dimensional, spatially and/or temporally, and functional in nature, meaning that there is an underlying continuity to the biological process of interest. GPS-tracking of animal movement is commonly characterized by irregular time-recording of animal position, and the movement relationships between animals are prone to sudden change. In this dissertation, I propose a spline modeling approach for exploring interactions and time-dependent correlation between the movement of apex predators exhibiting territorial and territory-sharing behavior. A measure of localized mutual information (LMI) is …


Functional Multidimensional Scaling, Liting Li May 2022

Functional Multidimensional Scaling, Liting Li

Theses and Dissertations

Multidimensional scaling is an important component in analyzing proximity (similarity or dissimilarity) between objects and plays a key role in creating low-dimensional visualizations of objects. Regardless of the progress in this area, traditional solutions of multidimensional scaling problems are inapplicable to the proximity which change in time. In this dissertation, we focus on dissimilarity instead of similarity. Motivated by the studies of functional data analysis, we extend the current multidimensional scaling techniques and propose a functional method to obtain lower-dimensional smooth representations in terms of time-varying dissimilarities. This method incorporates the smoothness approach of functional data analysis by using cubic …