Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Bayesian (21)
- Statistics (18)
- Physical Sciences and Mathematics, Statistics and Probability (12)
- Simulation (10)
- Biostatistics (7)
-
- Classification (7)
- Semiparametric (7)
- Survival analysis (7)
- Physical Sciences and Mathematics, Statistics (6)
- Physical Sciences and Mathematics, Statistics and Probability, Biostatistics (6)
- #antcenter (5)
- Bayesian Hierarchical Models (5)
- Design of experiments (5)
- EM algorithm (5)
- Estimation (5)
- Gibbs sampling (5)
- Logistic regression (5)
- MCMC (5)
- Machine Learning (5)
- Machine learning (5)
- Markov chain Monte Carlo (5)
- Optimization (5)
- Parameter estimation (5)
- Survival Analysis (5)
- Cluster analysis (4)
- Clustering (4)
- Experimental design (4)
- Functional Data Analysis (4)
- Gene expression (4)
- Genomics (4)
- Publication Year
Articles 1 - 30 of 507
Full-Text Articles in Physical Sciences and Mathematics
Developing Machine Learning And Time-Series Analysis Methods With Applications In Diverse Fields, Muhammed Aljifri
Developing Machine Learning And Time-Series Analysis Methods With Applications In Diverse Fields, Muhammed Aljifri
Theses and Dissertations
This dissertation introduces methodologies that combine machine learning models with time-series analysis to tackle data analysis challenges in varied fields. The first study enhances the traditional cumulative sum control charts with machine learning models to leverage their predictive power for better detection of process shifts, applying this advanced control chart to monitor hospital readmission rates. The second project develops multi-layer models for predicting chemical concentrations from ultraviolet-visible spectroscopy data, specifically addressing the challenge of analyzing chemicals with a wide range of concentrations. The third study presents a new method for detecting multiple changepoints in autocorrelated ordinal time series, using the …
The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin
The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin
Theses and Dissertations
This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …
Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop
Theses and Dissertations
This dissertation focuses on theory and application of discrete data methods, particularly approaches to over- and underdispersion relative to the Poisson distribution and an application of random forest and logistic regression modeling. The first chapter derives a score test for over- and underdispersion in the heaped generalized Poisson distribution. Equi-, over-, and underdispersed heaped generalized Poisson and heaped negative binomial data are simulated to evaluate the performance of the score test by comparing the power it achieves to that of Wald and likelihood ratio tests. We find that the score test we derive performs comparably to both the Wald and …
Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin
Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin
Theses and Dissertations
The recent emergence of single cell sequencing (SCS) technology has provided us with single-cell DNA or RNA sequencing (scDNA/RNA-seq) information to investigate cellular evolutionary relationships. Despite many analysis methods have been developed to infer intra-tumor genetic heterogeneity, cluster cellular subclones, detect genetic mutations, and investigate spatially variable (SV) genes, exploring SCS data remains statistically challenging due to its noisy nature.
To identify subclones with scDNA-seq data, many existing studies use an independent statistical model to detect copy number profile in the first step, followed by classical clustering methods for subclone identification in downstream analyses. However, spurious results might be generated …
A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni
A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni
Theses and Dissertations
Scan statistics are useful methods for detecting spatial clustering. While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data. They have many applications in different fields such as epidemiology (e.g. detecting disease outbreaks), sociology (e.g. detecting crime hotspots), and environmental health (e.g. detecting high-pollution areas). Spatial scan statistics identify a ‘most likely cluster’ and then use a likelihood ratio test to determine if this cluster is statistically significant. Spatial scan statistics have been extended to the Bayesian …
Explorations In Baseball Analytics: Simulations, Predictions, And Evaluations For Games And Players, Katelyn Mongerson
Explorations In Baseball Analytics: Simulations, Predictions, And Evaluations For Games And Players, Katelyn Mongerson
Theses and Dissertations
From statistics being reported in newspapers in the 1840s, to present day, baseballhas always been one of the most data-driven sports. We make use of the endless publicly available baseball data to build models in R and Python that answer various baseball- related questions regarding predicting and optimizing run production, evaluating player effectiveness, and forecasting the postseason. To predict and optimize run production, we present three models. The first builds a common tool in baseball analysis called a Run Expectancy Matrix which is used to give a value (in terms of runs) to various in-game decisions. The second uses the …
Change Point Detection For A Process Having Several Regimes, Oliver Gerd Meister
Change Point Detection For A Process Having Several Regimes, Oliver Gerd Meister
Theses and Dissertations
In this dissertation, possible methods for multiple change point detection on Markovchain processes are studied. Related works for oine and online change point detection are discussed and their applicability on sequential multiple change point detection for several regimes is evaluated. We develop a method for a multiple change point detection for a process having three regimes. Its eciency is then evaluated on simulated Markov chain data by looking into dierent scenarios such as processes that signicantly dier between each other or probability distributions that are slightly similar. This approach is then applied on Covid- 19 hospital data. Therefore, the data …
A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun
Theses and Dissertations
Chronic kidney disease (CKD) is a significant complication that contributes to diabetes-related mortality in the United States, and there is growing evidence that sodium-glucose cotransporter 2 inhibitors (SGLT2i) can slow its progression. However, observational studies may suffer from confounding by indication, where patient characteristics and disease severity influence the decision to prescribe SGLT2i. This study utilized electronic health records of individuals with diabetes (from TriNetX) to investigate the effectiveness of SGLT2i on CKD progression. The database provided detailed information on patients’ CKD status, demographics, diagnosis, procedures, and medications, along with corresponding dates of diagnosis and prescription. The study comprised of …
A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas
A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas
Theses and Dissertations
Obesity is the accumulation of an abnormal, or excessive, amount of fat in the body, which can have negative effects on overall health. This excess accumulation of macronutrients in adipose tissue can cause the release of inflammatory mediators, leading to a proinflammatory state. Inflammation is a known risk factor for various health conditions, including cardiovascular diseases, metabolic syndrome, and diabetes. This study sought to examine the use of data mining methods, particularly clustering algorithms, to identify inflammatory biomarker phenotypes and their association with obesity in a local adolescent population. The algorithms evaluated in this study included: k-means, Ward's hierarchical …
Sparse Partitioned Empirical Bayes Ecm Algorithms For High-Dimensional Linear Mixed Effects And Heteroscedastic Regression, Anja Zgodic
Theses and Dissertations
Variable selection methods in both the frequentist and Bayesian frameworks are powerful techniques that provide prediction and inference in high-dimensional linear regression models. These methods often assume independence between observations and normally distributed errors with the same variance. In practice, these two assumptions are often violated. To mitigate this, we develop efficient and powerful Bayesian approaches for linear mixed modeling and heteroscedastic linear regression. These method offers increased flexibility through the development of empirical Bayes estimators for hyperparameters, with computationally efficient estimation through the Expectation Conditional-Minimization (ECM) algorithm. The novelty of these approaches lies in the partitioning and parameter expansion, …
Advancements In Parametric Modal Regression, Qingyang Liu
Advancements In Parametric Modal Regression, Qingyang Liu
Theses and Dissertations
This dissertation considers statistical inference methods for parametric modal regression models. In Chapter 1, we motivate the mode as the measure of central tendency instead of the median or the mean with an example. Following the motivational example, we include an overview of existing modal regression models. Later, in the same chapter, we explain advantages of the parametric modal regression models over existing nonparametric modal regression models. In Chapter 2, we address issues in statistical inference brought in by data contaminated with measurement error. With measurement error in covariates, statistical inference methods designed for modal regression models with error-free covariates …
Detecting Spatially Varying Coefficient Effects With Conditional Autoregressive Models: A Simulation Study Using Social Determinants Of Health Screening Data, Reid J. Demass
Theses and Dissertations
Generalized linear models which include spatially varying coefficient terms allow researchers to determine if the association between predictor and outcome variables vary across geographic space. Such models are particularly applicable to research with public health data where interventions and limited health care resources must be allocated carefully. The integrated nested Laplace approximation (INLA) methodology available in the R INLA package is a popular tool to estimate spatially varying coefficients. To assess the performance of the estimation procedure, patient emergency department (ED) visits were simulated from data sourced from a pilot study at Prisma Health. The INLA technique was used to …
Bayesian Dependence Structure Analysis For Ordinal Data, Yang He
Bayesian Dependence Structure Analysis For Ordinal Data, Yang He
Theses and Dissertations
This dissertation explores different methods to study the dependence structure among many ordinal variables under the Bayesian framework.
Chapter 1 introduces ordinal data analysis methods, and the related literature works are briefly reviewed. An outline of the dissertation is put forward.
In Chapter 2, Gaussian copula graphical models with different priors of graphical Lasso, adaptive graphical Lasso, and spike-and-slab Lasso on the precision matrix are assessed and compared. The proposed models are well illustrated via simulations and a real ordinal survey data analysis.
In Chapter 3, adaptive spike-and-slab Lasso prior is proposed as an extension of Chapter 2. The developed …
Examining Failures Of Kc-135 Boom Assemblies Using Survival Analysis, Benjamin D. Miller
Examining Failures Of Kc-135 Boom Assemblies Using Survival Analysis, Benjamin D. Miller
Theses and Dissertations
The purposes of this study are to confirm the applicability of survival analysis for predicting recurrent failures of a component of a military aircraft and to provide practical insights to maintenance managers and mission planners. The results of this study also can help the United States Department of Defense improve the CBM+ program. This study was able to predict recurrent failures of the component using Nelson-Aalen cumulative estimates. In addition, this study used a Cox proportional hazards regression model with shared frailty for measuring the effect of covariates on recurrent failures and unidentified heterogeneity in the model, which warranted future …
Probability Of Agreement As A Simulation Validation Methodology, Matthew C. Ledwith
Probability Of Agreement As A Simulation Validation Methodology, Matthew C. Ledwith
Theses and Dissertations
Determining whether a simulation model is operationally valid requires the rigorous assessment of agreement between observed functional responses of the simulation model and the corresponding real world system or process of interest. This research seeks to extend and formulate the probability of agreement approach to the operational validation of simulation models. The first paper provides a methodological approach and an initial demonstration which leverages bootstrapping to overcome situations where one’s ability to collect real-world data is limited. The second paper extends the probability of agreement approach to account for second-order heteroscedastic variability structures and establishes a weighted probability of agreement …
Examining Fuel Service System Failures Of The Usaf R11 Using Survival Analysis, Roed M.S. Mejia
Examining Fuel Service System Failures Of The Usaf R11 Using Survival Analysis, Roed M.S. Mejia
Theses and Dissertations
Recent events show that fuel supply is a large contributor to the success or failure of a military operation in response to a contingency. Any future near-peer conflict will stress the supply chain and require fully operational vehicles to be ready for the primary mission sets they support. In the United States Air Force (USAF), the readiness of fuel distribution trucks is crucial to meeting those mission sets in global operations. Utilizing non-parametric and semi-parametric survival models, which do not assume specific probability distributions, this study analyzes maintenance data for R-11 trucks that refuel aircraft.
Debris Survivability Study For Mega-Constellation Architectures, Joseph C. Canoy
Debris Survivability Study For Mega-Constellation Architectures, Joseph C. Canoy
Theses and Dissertations
The analysis for the overall theoretical debris survivabilty of mega-constellation architectures, with an emphasis on space-based ballistic missile defense constellation (SB-BMD), is explored via three extensive different Monte Carlo simulations: preliminary analysis of low Earth Orbit (LEO) mega-constellation survivabilty following a fragmentation event within the constellation, analysis of LEO mega-constellation survivability with a fragmentation event occurring on a satellite performing a maneuver to insert itself within the constellation, and the analysis of LEO mega-constellation survivabilty after a fragmentation event resulting from the destruction of a missile. The LEO mega-constellations represent the SB-BMD constellation. The first two analysis sections will include …
Variability In Causal Effects On A Binary Outcome And Noncompliance In A Multisite Randomized Trial, Xinxin Sun
Variability In Causal Effects On A Binary Outcome And Noncompliance In A Multisite Randomized Trial, Xinxin Sun
Theses and Dissertations
Noncompliance to treatment assignment is widespread in randomized trials and presents challenges in causal inference. In the presence of noncompliance, the most commonly estimated effect of treatment assignment, also known as intent-to-treat (ITT) effect, is biased. Of interest in this setting is the complier average causal effect (CACE), the ITT effect among compliers. Further complication arises when the outcome variable is partially observed.
My research focuses on estimating the distribution of a site-specific CACE in a multisite randomized controlled trial (MRCT) by maximum likelihood (ML). Assuming compliance missing at random (MAR). We express the likelihood as an integral with respect …
Model-Based Imputation Of Below Detection Limit Missing Data And Group Selection In Bayesian Group Index Regression, Matthew Carli
Model-Based Imputation Of Below Detection Limit Missing Data And Group Selection In Bayesian Group Index Regression, Matthew Carli
Theses and Dissertations
Investigations into the association between chemical exposure and health outcomes are increasingly focused on the role of chemical mixtures, as opposed to individual chemicals. The analysis of chemical mixture data required the development of novel statistical methods, one of these being Bayesian group index regression. A statistical challenge common to all chemical mixture analyses is the ubiquitous presence of below detection limit (BDL) data. We propose an extension of Bayesian group index regression that treats both regression effects and missing BDL observations as parameters in a model estimated through a Markov Chain Monte Carlo algorithm that we refer to as …
Dynamics Of Redox-Driven Molecular Processes In Local And Systemic Plant Immunity, Philip Berg
Dynamics Of Redox-Driven Molecular Processes In Local And Systemic Plant Immunity, Philip Berg
Theses and Dissertations
The work here presents two main parts. In the first part, chapters 1 – 3 focus on dynamical systems modeling in plant immunity, whereas chapters 4 – 6 describe contributions to computational modeling and analysis of proteomics and genomics data. Chapter 1 investigates dynamical and biochemical patterns of reversibly oxidized cysteines (RevOxCys) during effector-triggered immunity (ETI) in Arabidopsis, examines the regulatory patterns associated with Arabidopsis thimet oligopeptidase 1 and 2’s (TOP1 and TOP2), roles in the RevOxCys events during ETI, and analyzes the redox phenotype of the top1top2 mutant. The second chapter investigates the peptidome dynamics during ETI …
Towards Structured Planning And Learning At The State Fisheries Agency Scale, Caleb A. Aldridge
Towards Structured Planning And Learning At The State Fisheries Agency Scale, Caleb A. Aldridge
Theses and Dissertations
Inland recreational fisheries has grown philosophically and scientifically to consider economic and sociopolitical aspects (non-biological) in addition to the biological. However, integrating biological and non-biological aspects of inland fisheries has been challenging. Thus, an opportunity exists to develop approaches and tools which operationalize planning and decision-making processes which include biological and non-biological aspects of a fishery. This dissertation expands the idea that a core set of goals and objectives is shared among and within inland fisheries agencies; that many routine operations of inland fisheries managers can be regimented or standardized; and the novel concept that current information and operations can …
Weather Parameters Influencing The Incidence Of Citrus Canker Caused By Aw Strain In The Rio Grande Valley, Amit Sharma
Weather Parameters Influencing The Incidence Of Citrus Canker Caused By Aw Strain In The Rio Grande Valley, Amit Sharma
Theses and Dissertations
Citrus canker caused by bacterium Xanthomonas citri subsp. citri (Xcc) seriously affects the citrus industry by making the fruit unmarketable due to unsightly lesions on the fruit. Canker caused by Aw strain of Xcc was reported in the citrus trees located in the residential areas of the Rio Grande Valley (RGV). Canker severity differs amongst cultivars/varieties, and it is influenced by prevailing environmental conditions. Multiple regression modeling of the disease incidence with the environmental variables such as temperature, humidity, windspeed, wind gust, and rainfall was performed to understand the environmental conditions that are favorable for spread of citrus …
Statistical Inference On Desirability Function Optimal Points To Evaluate Multi-Objective Response Surfaces, Peter A. Calhoun
Statistical Inference On Desirability Function Optimal Points To Evaluate Multi-Objective Response Surfaces, Peter A. Calhoun
Theses and Dissertations
A shortfall of the Derringer and Suich (1980) desirability function is lack of inferential methods to quantify uncertainty. Most articles for addressing uncertainty usually involve robust methods, providing a point estimate that is less affected by variation. Few articles address confidence intervals or bands but not specifically for the Derringer and Suich method. This research provides two valuable contributions to the field of response surface methodology. The first contribution is evaluating the effect of correlation and plane angles on Derringer and Suich optimal solutions. The second contribution proposes and compares 8 inferential methods--both univariate and multivariate--for creating confidence intervals on …
Orthogonal Arrays And Legendre Pairs, Kristopher N. Kilpatrick
Orthogonal Arrays And Legendre Pairs, Kristopher N. Kilpatrick
Theses and Dissertations
Well-designed experiments greatly improve test and evaluation. Efficient experiments reduce the cost and time of running tests while improving the quality of the information obtained. Orthogonal Arrays (OAs) and Hadamard matrices are used as designed experiments to glean as much information as possible about a process with limited resources. However, constructing OAs and Hadamard matrices in general is a very difficult problem. Finding Legendre pairs (LPs) results in the construction of Hadamard matrices. This research studies the classification problem of OAs and the existence problem of LPs. In doing so, it makes two contributions to the discipline. First, it improves …
Analytic Case Study Using Unsupervised Event Detection In Multivariate Time Series Data, Jeremy M. Wightman
Analytic Case Study Using Unsupervised Event Detection In Multivariate Time Series Data, Jeremy M. Wightman
Theses and Dissertations
Analysis of cyber-physical systems (CPS) has emerged as a critical domain for providing US Air Force and Space Force leadership decision advantage in air, space, and cyberspace. Legacy methods have been outpaced by evolving battlespaces and global peer-level challengers. Automation provides one way to decrease the time that analysis currently takes. This thesis presents an event detection automation system (EDAS) which utilizes deep learning models, distance metrics, and static thresholding to detect events. The EDAS automation is evaluated with case study of CPS domain experts in two parts. Part 1 uses the current methods for CPS analysis with a qualitative …
Defining Viable Solar Resource Locations In The Southeast United States Using The Satellite-Based Glass Product, Jolie Kavanagh
Defining Viable Solar Resource Locations In The Southeast United States Using The Satellite-Based Glass Product, Jolie Kavanagh
Theses and Dissertations
This research uses satellite data and the moment statistics to determine if solar farms can be placed in the Southeast US. From 2001-2019, the data are analyzed in reference to the Southwest US, where solar farms are located. The clean energy need is becoming more common; therefore, more locations than arid environments must be observed. The Southeast US is the main location of interest due to the warm, moist environment throughout the year. This research uses the Global Land Surface Satellite (GLASS) photosynthetically active radiation product (PAR) to determine viable locations for solar panels. A probability density function (PDF) along …
Abm Simulation Model Of A Pandemic For Optimizing Vaccination Strategy, Gibeom Park
Abm Simulation Model Of A Pandemic For Optimizing Vaccination Strategy, Gibeom Park
Theses and Dissertations
This study presents a process-oriented hybrid model for individuals' immune responses and interactions involving vaccination to describe the trend of contagious disease and estimate the future societal cost. The model considers "recovery" as a non-absorbing state and incorporates various infection stage states including two symptomatic states. To model contagiousness to be consistent with the current pandemic and include that the spread of a disease depends on the mobility of people, we developed an Agent-Based Simulator that fitted to the particular model used in this study and can test various what-if scenarios. We improved the simulator considerably by appying data structures …
Effects Of Macronutrients Intake And Physical Activity On Childhood Obesity Of Hispanic Children, Prosanta Barai
Effects Of Macronutrients Intake And Physical Activity On Childhood Obesity Of Hispanic Children, Prosanta Barai
Theses and Dissertations
Obesity has become more ubiquitous during the past few decades, and still, its prevalence is increasing. It is in every population in the world and all regions, including rural parts of low and middle-income countries. In the USA, regardless of age, the severity of obesity is no different from the global trend. Although numerous pieces of literature are available, that tried to find answers to some pressing issues like how obesity can be controlled, but there is little to no study focused on younger children, especially the 4-6-year-old Hispanic population. Our study aimed to determine the causal path among literature …
Neural Networks And Stochastic Differential Equations, Stephanie L. Flores
Neural Networks And Stochastic Differential Equations, Stephanie L. Flores
Theses and Dissertations
Influenced by the seminal work, “Physics Informed Neural Networks” by Raissi et al., 2017, there has been a growing interest in solving and parameter estimation of Nonlinear Partial Differential Equations (PDE) with Deep Neural networks in recent years. In fact, this has broadened the pathways and shed light on deep learning of stochastic differential equations (SDE) and stochastic PDE’s (SPDE).In this work, we intend to investigate the current approaches of solving and parameter estimation of the SDE/SPDE with deep neural networks and the possibility of extending them to obtain more accurate/stable solutions with residual systems and/or generative adversarial neural networks. …
Statistical Methods For Analyzing Multi-Omics Data: Dependence Structure And Missing Values, Wenda Zhang
Statistical Methods For Analyzing Multi-Omics Data: Dependence Structure And Missing Values, Wenda Zhang
Theses and Dissertations
The advancements in high-throughput technologies have made it possible to generate a huge number of "omics'' data, including genomics, proteomics, transcriptomics, epigenomics, metabolomics, and microbiomics. Combining multiple data sources and performing joint analyses with all available information and the phenotypic outcome can reflect various aspects in complex biological systems, such as revealing regulation processes, discovering novel associations between biological entities, and identifying relevant biomarkers for certain diseases or phenotypic outcomes. This dissertation focuses on developing statistical models for analyzing multi-omics data. It is comprised of three topics: (1) integrative analysis for multi-omics data with missing observations in intermediate variables; (2) …