Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 1347

Full-Text Articles in Statistics and Probability

Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu Mar 2024

Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu

Bulletin of Chinese Academy of Sciences (Chinese Version)

Data sovereignty has become an important component of national sovereignty in the dual context of the digital economy development and the overall national security concept. Major countries and regions are actively carrying out data sovereignty strategic deployment and engaging in fierce competition in data resources, data technology, and data rules. This work adopts the policy text analysis method to study China’s data sovereignty policy, and employs the LDA model and policy instruments to quantitatively analyze the process evolution and thematic characteristics of China’s data sovereignty policy. Drawing on these findings, this study comprehensively considers the global data sovereignty policy and …


Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi Feb 2024

Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi

SDSU Data Science Symposium

Accurate crop yield predictions can help farmers make adjustments or changes in their farming practices to optimize their harvest. Remote sensing data is an inexpensive approach to collecting massive amounts of data that could be utilized for predicting crop yield. This study employed linear regression and spatial linear models were used to predict soybean yield with data from Landsat 8 OLI. Each model was built using only spectral bands of the satellite, only vegetation indices, and both spectral bands and vegetation indices. All analysis was based on data collected from two fields in South Dakota from the 2019 and 2021 …


Session 6: The Size-Biased Lognormal Mixture With The Entropy Regularized Algorithm, Tatjana Miljkovic, Taehan Bae Feb 2024

Session 6: The Size-Biased Lognormal Mixture With The Entropy Regularized Algorithm, Tatjana Miljkovic, Taehan Bae

SDSU Data Science Symposium

A size-biased left-truncated Lognormal (SB-ltLN) mixture is proposed as a robust alternative to the Erlang mixture for modeling left-truncated insurance losses with a heavy tail. The weak denseness property of the weighted Lognormal mixture is studied along with the tail behavior. Explicit analytical solutions are derived for moments and Tail Value at Risk based on the proposed model. An extension of the regularized expectation–maximization (REM) algorithm with Shannon's entropy weights (ewREM) is introduced for parameter estimation and variability assessment. The left-truncated internal fraud data set from the Operational Riskdata eXchange is used to illustrate applications of the proposed model. Finally, …


Making Sense Of Making Parole In New York, Alexandra Mcglinchy Feb 2024

Making Sense Of Making Parole In New York, Alexandra Mcglinchy

Dissertations, Theses, and Capstone Projects

For many individuals incarcerated in New York, the initial step toward freedom begins with an interview with the Board of Parole. This process, however, is frequently a complex and challenging one, characterized by repeated denials and extended incarcerations. The disparity in outcomes – where one individual may receive over 20 denials and another is granted parole on their first attempt – highlights the ambiguity and inconsistency in the parole decision-making process. This project aims to clarify the factors that influence parole decisions by concentrating on measurable variables. These include age, race, duration of sentence served, proportion of sentence served, type …


Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete Feb 2024

Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete

Dissertations, Theses, and Capstone Projects

This study explores COVID-19 clinical outcomes in Mexico, focusing on demographic, clinical, and chronic disease variables to develop predictive models. In the binary classification task, the Ada Boost Classifier distinguishes survivors from non-survivors, with age, sex, ethnicity, and chronic medical conditions influencing outcomes. In multiclass classification, the Gradient Boosting Classifier categorizes patients into outcome groups.

Demographic variables, especially age, are crucial for predicting COVID-19 outcomes for both the binary and multiclass classification tasks. Clinical information about previous conditions, including chronic diseases, also holds relevance, especially diabetes, immunocompromise, and cardiovascular diseases. These insights inform public health measures and healthcare strategies, emphasizing …


Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown Jan 2024

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown

The Journal of Purdue Undergraduate Research

No abstract provided.


Sensitivity Analysis Of Prior Distributions In Regression Model Estimation, Ayoade I Adewole, Oluwatoyin K. Bodunwa Jan 2024

Sensitivity Analysis Of Prior Distributions In Regression Model Estimation, Ayoade I Adewole, Oluwatoyin K. Bodunwa

Al-Bahir Journal for Engineering and Pure Sciences

Bayesian inferences depend solely on specification and accuracy of likelihoods and prior distributions of the observed data. The research delved into Bayesian estimation method of regression models to reduce the impact of some of the problems, posed by convectional method of estimating regression models, such as handling complex models, availability of small sample sizes and inclusion of background information in the estimation procedure. Posterior distributions are based on prior distributions and the data accuracy, which is the fundamental principles of Bayesian statistics to produce accurate final model estimates. Sensitivity analysis is an essential part of mathematical model validation in obtaining …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox Jan 2024

A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox

Dartmouth College Master’s Theses

The end-Cretaceous mass extinction was marked by both the Chicxulub impact and the ongoing emplacement of the Deccan Traps flood basalt province. Both of these events perturbed the environment by the emission of climate-active volatiles, primarily CO2 and SO2. To understand the mechanism of extinction, we must disentangle the timing, duration, and intensity of volcanic and meteoritic environmental forcings. In this thesis, we used a parallel Markov chain Monte Carlo approach to invert for the aforementioned volatile emissions, export productivity, and remineralization from 67 to 65 million years ago using the LOSCAR (Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir) model. The parallel …


Multiscale Modelling Of Brain Networks And The Analysis Of Dynamic Processes In Neurodegenerative Disorders, Hina Shaheen Jan 2024

Multiscale Modelling Of Brain Networks And The Analysis Of Dynamic Processes In Neurodegenerative Disorders, Hina Shaheen

Theses and Dissertations (Comprehensive)

The complex nature of the human brain, with its intricate organic structure and multiscale spatio-temporal characteristics ranging from synapses to the entire brain, presents a major obstacle in brain modelling. Capturing this complexity poses a significant challenge for researchers. The complex interplay of coupled multiphysics and biochemical activities within this intricate system shapes the brain's capacity, functioning within a structure-function relationship that necessitates a specific mathematical framework. Advanced mathematical modelling approaches that incorporate the coupling of brain networks and the analysis of dynamic processes are essential for advancing therapeutic strategies aimed at treating neurodegenerative diseases (NDDs), which afflict millions of …


Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Microplate-Like Metal Pyrophosphate Engineered On Ni-Foam Towards Multifunctional Electrode Material For Energy Conversion And Storage, Rishabh Srivastava Dec 2023

Microplate-Like Metal Pyrophosphate Engineered On Ni-Foam Towards Multifunctional Electrode Material For Energy Conversion And Storage, Rishabh Srivastava

Electronic Theses & Dissertations

High clean energy demand, dire need for sustainable development, and low carbon footprints are the few intuitive challenges, leading researchers to aim for research and development for high-performance energy devices. The development of materials used in energy devices is currently focused on enhancing the performance, electronic properties, and durability of devices. Tunning the attributes of transition metals using pyrophosphate (P2O7) ligand moieties can be a promising approach to meet the requirements of energy devices such as water electrolyzers and supercapacitors, although such a material’s configuration is rarely exposed for this purpose of study.

Herein, we grow …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Analyzing The Efficacy Of Covid-19 Travel Bans: A Regression Analysis Approach, Mallory Kochanek Dec 2023

Analyzing The Efficacy Of Covid-19 Travel Bans: A Regression Analysis Approach, Mallory Kochanek

Honors Projects

Some might associate the term ‘public health’ with the pandemic that occurred in 2020. COVID-19 spread like most have never seen in their lifetime. It is useful to look at the effectiveness of the travel re- strictions in mitigating the spread of the global pandemic. Using linear regression and network regression, we obtain parameter estimates to determine the relation of predictors, such as network effect, percentage of urban population and GDP, on the COVID-19 incidence rate for the months January to April of 2020. Linear regression does not ac- count for the correlation structure of the data. Network regression, on …


Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin Dec 2023

The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin

Theses and Dissertations

This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …


Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara Dec 2023

Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara

Electronic Theses and Dissertations

Causal inference is a method used in various fields to draw causal conclusions based on data. It involves using assumptions, study designs, and estimation strategies to minimize the impact of confounding variables. Propensity scores are used to estimate outcome effects, through matching methods, stratification, weighting methods, and the Covariate Balancing Propensity Score method. However, they can be sensitive to estimation techniques and can lead to unstable findings. Researchers have proposed integrating weighing with regression adjustment in parametric models to improve causal inference validity. The first project focuses on Bayesian joint and two-stage methods for propensity score analysis. Propensity score modeling …


Radiation Exposure Calibration Of The Al2o3:C With Radium-226 And Cesium-137 Using The Osl Method, Selma Tepeli Aydin Dec 2023

Radiation Exposure Calibration Of The Al2o3:C With Radium-226 And Cesium-137 Using The Osl Method, Selma Tepeli Aydin

All Theses

Optically stimulated luminescence (OSL) dosimetry was utilized to calibrate Al2O3:C powder dosimeters, available commercially as the nanoDot® from Landauer Inc., and compare the dosimeter response to radium-226 (226Ra) and cesium-137 (137Cs). The signal from the OSL was quantified using a microSTARii® OSL reader also produced by Landauer Inc. Dose-response curves were developed for 226Ra and 137Cs experiments (5 dosimeters each) at thirteen absorbed doses. Individual dosimeter response was tracked by serial number. Linear regression analysis was performed to determine if there were significant differences between the intercepts of the …


Bayesian Learning Of Spatiotemporal Source Distribution For Beached Microplastic In The Gulf Of Mexico, David Pojunas Dec 2023

Bayesian Learning Of Spatiotemporal Source Distribution For Beached Microplastic In The Gulf Of Mexico, David Pojunas

Graduate Theses and Dissertations

Over the last several decades, plastic waste has gradually accumulated while slowly degrading in terrestrial and oceanic environments. Recently, there has been an increased effort to identify the possible sources of plastic to understand how they affect vulnerable beaches. This issue is of particular concern in the Gulf of Mexico due to the presence of oil, natural gas, and plastic production. In this thesis, we expand upon existing Bayesian plastic attribution models and develop a rigorous statistical framework to map observed beached microplastics to their sources. Within this framework, we combine Lagrangian backtracking simulations of floating particles using nurdle beaching …


The Impacts Of The Covid-19 Pandemic On Mental Health Across Different Genders And Sexualities, Jiale Zhu, Jonas Katona Nov 2023

The Impacts Of The Covid-19 Pandemic On Mental Health Across Different Genders And Sexualities, Jiale Zhu, Jonas Katona

Undergraduate Research Journal for the Human Sciences

Current studies report an increase in psychological distress as a result of the COVID-19 pandemic. This study is interested in examining mental health disparities and how the COVID-19 pandemic has disproportionately impacted marginalized groups—and more specifically, those identified by sex, gender, and sexuality—compared with the general population. This study also considers the effects and ramifications of different policy measures taken during the course of the pandemic. We perform exploratory data modeling and analysis on several important and publicly available datasets taken during the pandemic on mental health and COVID-19 infection data across various identity groups to look for significant disparities, …


Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Predicting Dengue Incidence In Central Argentina Using Google Trends Data, Sahil Chindal Nov 2023

Predicting Dengue Incidence In Central Argentina Using Google Trends Data, Sahil Chindal

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


The Double Edged Sword Of The Pandemic: Exploring Associations Between Covid-19 And Social Isolation In The Usa, Alexander Fulk Nov 2023

The Double Edged Sword Of The Pandemic: Exploring Associations Between Covid-19 And Social Isolation In The Usa, Alexander Fulk

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Langevin Dynamic Models For Smfret Dynamic Shift, David Frost, Keisha Cook Dr, Hugo Sanabria Dr Nov 2023

Langevin Dynamic Models For Smfret Dynamic Shift, David Frost, Keisha Cook Dr, Hugo Sanabria Dr

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Mathematical Modeling Of The Impact Of Lobbying On Climate Policy, Andrew Jacoby, Claire Hannah, James Hutchinson, Jasmine Narehood, Aditi Ghosh, Padmanabhan Seshaiyer Nov 2023

Mathematical Modeling Of The Impact Of Lobbying On Climate Policy, Andrew Jacoby, Claire Hannah, James Hutchinson, Jasmine Narehood, Aditi Ghosh, Padmanabhan Seshaiyer

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña Nov 2023

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña

Electronic Theses and Dissertations

Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …


Decentralized Science (Desci): A New Paradigm For Diverse And Sustainable Scientific Development, Feiyue Wang, Wenwen Ding Oct 2023

Decentralized Science (Desci): A New Paradigm For Diverse And Sustainable Scientific Development, Feiyue Wang, Wenwen Ding

Bulletin of Chinese Academy of Sciences (Chinese Version)

The rise of artificial intelligence for science (AI4S) has made it particularly important and urgent to ensure the openness, fairness, impartiality, diversity, and sustainability of scientific systems. This is significant to the discourse power and leadership of countries in global innovation and industrial revolution, and also affects the security, stability, and sustainable development of a community with a shared future for mankind. To address these challenges, AI4S needs to adopt new scientific organizational and operational methods. Decentralized science (DeSci) has emerged to vitalize AI4S and provide strong support, effectively addressing issues such as information silos, biases, unfair distribution, and monopolies …


Deep Q-Learning Framework For Quantitative Climate Change Adaptation Policy For Florida Road Network Due To Extreme Precipitation, Orhun Aydin Oct 2023

Deep Q-Learning Framework For Quantitative Climate Change Adaptation Policy For Florida Road Network Due To Extreme Precipitation, Orhun Aydin

I-GUIDE Forum

Climate change-induced extreme weather and increasing population are increasing the pressure on the global aging road networks. Adaptation requires designing interventions and alterations to the road networks that consider future dynamics of flooding and increased traffic due to the growing population. This paper introduces a reinforcement learning approach to designing interventions for Florida's road network under future traffic and climate projections. Three climate models and a tide and surge model are used to create flooding and coastal inundation projections, respectively. The optimal sequence of decisions for adapting Florida's road network to minimize flooding-related disruptions is solved by using a graph-based …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …