Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

Discipline
Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 467

Full-Text Articles in Statistical Models

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


Dynamic Influence Diagram-Based Deep Reinforcement Learning Framework And Application For Decision Support For Operators In Control Rooms, Joseph Mietkiewicz, Ammar N. Abbas, Chidera Winifred Amazu, Anders L. Madsen, Gabriele Baldissone Sep 2023

Dynamic Influence Diagram-Based Deep Reinforcement Learning Framework And Application For Decision Support For Operators In Control Rooms, Joseph Mietkiewicz, Ammar N. Abbas, Chidera Winifred Amazu, Anders L. Madsen, Gabriele Baldissone

Articles

In today’s complex industrial environment, operators are often faced with challenging situations that require quick and accurate decision-making. The human-machine interface (HMI) can display too much information, leading to information overload and potentially compromising the operator’s ability to respond effectively. To address this challenge, decision support models are needed to assist operators in identifying and responding to potential safety incidents. In this paper, we present an experiment to evaluate the effectiveness of a recommendation system in addressing the challenge of information overload. The case study focuses on a formaldehyde production simulator and examines the performance of an improved Human-Machine Interface …


Modeling Biphasic, Non-Sigmoidal Dose-Response Relationships: Comparison Of Brain- Cousens And Cedergreen Models For A Biochemical Dataset, Venkat D. Abbaraju, Tamaraty L. Robinson, Brian P. Weiser Aug 2023

Modeling Biphasic, Non-Sigmoidal Dose-Response Relationships: Comparison Of Brain- Cousens And Cedergreen Models For A Biochemical Dataset, Venkat D. Abbaraju, Tamaraty L. Robinson, Brian P. Weiser

Rowan-Virtua School of Osteopathic Medicine Faculty Scholarship

Biphasic, non-sigmoidal dose-response relationships are frequently observed in biochemistry and pharmacology, but they are not always analyzed with appropriate statistical methods. Here, we examine curve fitting methods for “hormetic” dose-response relationships where low and high doses of an effector produce opposite responses. We provide the full dataset used for modeling, and we provide the code for analyzing the dataset in SAS using two established mathematical models of hormesis, the Brain-Cousens model and the Cedergreen model. We show how to obtain and interpret curve parameters such as the ED50 that arise from modeling, and we discuss how curve parameters might change …


Movie Recommender System Using Matrix Factorization, Roland Fiagbe May 2023

Movie Recommender System Using Matrix Factorization, Roland Fiagbe

Data Science and Data Mining

Recommendation systems are a popular and beneficial field that can help people make informed decisions automatically. This technique assists users in selecting relevant information from an overwhelming amount of available data. When it comes to movie recommendations, two common methods are collaborative filtering, which compares similarities between users, and content-based filtering, which takes a user’s specific preferences into account. However, our study focuses on the collaborative filtering approach, specifically matrix factorization. Various similarity metrics are used to identify user similarities for recommendation purposes. Our project aims to predict movie ratings for unwatched movies using the MovieLens rating dataset. We developed …


Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer May 2023

Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer

Senior Honors Theses

As research into hockey analytics continues, an increasing number of metrics are being introduced into the knowledge base of the field, creating a need to determine whether various stats are useful or simply add noise to the discussion. This paper examines microstatistics – manually tracked metrics which go beyond the NHL’s publicly released stats – both through the lens of meta-analytics (which attempt to objectively assess how useful a metric is) and modeling game probabilities. Results show that while there is certainly room for improvement in understanding and use of microstats in modeling, the metrics overall represent an area of …


Classification Of Adult Income Using Decision Tree, Roland Fiagbe Jan 2023

Classification Of Adult Income Using Decision Tree, Roland Fiagbe

Data Science and Data Mining

Decision tree is a commonly used data mining methodology for performing classification tasks. It is a tree-based supervised machine learning algorithm that is used to classify or make predictions in a path of how previous questions are answered. Generally, the decision tree algorithm categorizes data into branch-like segments that develop into a tree that contains a root, nodes, and leaves. This project seeks to explore the decision tree methodology and apply it to the Adult Income dataset from the UCI Machine Learning Repository, to determine whether a person makes over 50K per year and determine the necessary factors that improve …


Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky Jan 2023

Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky

Basic Science Engineering

In this paper, Weibull-Linear Exponential distribution (WLED) has been investigated whether being it is a well-fit distribution to a clinical real data. These data represent the duration of remission achieved by a certain drug used in the treatment of leukemia for a group of patients. The statistical inference approach is used to estimate the parameters of the WLED through the set of the fitted data. The estimated parameters are utilized to evaluate the survival and hazard functions and hence assessing the treatment method through forecasting the duration of remission times of patients. A two-sample prediction approach has been applied to …


Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi Jan 2023

Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi

Honors Theses

All populations display patterns in allele frequencies over time. Some alleles cease to exist, while some grow to become the norm. These frequencies can shift or stay constant based on the conditions the population lives in. If in Hardy-Weinberg equilibrium, the allele frequencies stay constant. Most populations, however, have bias from environmental factors, sexual preferences, other organisms, etc. We propose a stochastic Markov chain model to study allele progression across generations. In such a model, the allele frequencies in the next generation depend only on the frequencies in the current one.

We use this model to track a recessive allele …


A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo Jun 2022

A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo

FIU Electronic Theses and Dissertations

Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadway networks. Underlying these simulators are mathematical models of microscopic driver behavior from which macroscopic measures of flow and congestion can be recovered. Many models are intended to apply to only a subset of possible traffic scenarios and roadway configurations, while others do not have any explicit constraint on their applicability. Work zones on highways are one scenario for which no model invented to date has been shown to accurately reproduce realistic driving behavior. This makes it difficult to optimize for safety and other …


The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang Jun 2022

The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang

Medical Student Research Symposium

Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.

Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …


A Course In Data Science: R And Prediction Modeling, Adam Kapelner May 2022

A Course In Data Science: R And Prediction Modeling, Adam Kapelner

Open Educational Resources

This is a self-contained course in data science and machine learning using R. It covers philosophy of modeling with data, prediction via linear models, machine learning including support vector machines and random forests, probability estimation and asymmetric costs using logistic regression and probit regression, underfitting vs. overfitting, model validation, handling missingness and much more. There is formal instruction of data manipulation using dplyr and data.table, visualization using ggplot2 and statistical computing.


Statistical Characteristics Of High-Frequency Gravity Waves Observed By An Airglow Imager At Andes Lidar Observatory, Alan Z. Liu, Bing Cao May 2022

Statistical Characteristics Of High-Frequency Gravity Waves Observed By An Airglow Imager At Andes Lidar Observatory, Alan Z. Liu, Bing Cao

Publications

The long-term statistical characteristics of high-frequency quasi-monochromatic gravity waves are presented using multi-year airglow images observed at Andes Lidar Observatory (ALO, 30.3° S, 70.7° W) in northern Chile. The distribution of primary gravity wave parameters including horizontal wavelength, vertical wavelength, intrinsic wave speed, and intrinsic wave period are obtained and are in the ranges of 20–30 km, 15–25 km, 50–100 m s−1, and 5–10 min, respectively. The duration of persistent gravity wave events captured by the imager approximately follows an exponential distribution with an average duration of 7–9 min. The waves tend to propagate against the local background winds and …


A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose Apr 2022

A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose

Masters Theses & Specialist Projects

Department of Psychological Sciences Western Kentucky University There are two options to estimate a range of likely values for the population mean of a continuous variable: one for when the population standard deviation is known and another for when the population standard deviation is unknown. There are seven proposed equations to calculate the confidence interval for the population mean of a dichotomous variable: normal approximation interval, Wilson interval, Jeffreys interval, Clopper-Pearson, Agresti-Coull, arcsine transformation, and logit transformation. In this study, I compared the percent effectiveness of each equation using a Monte Carlo analysis and the interval range over a range …


Death-Related Anxiety Associated With Riskier Decision-Making Irrespective Of Framing: A Bayesian Model Comparison, Blaine Tomkins Mar 2022

Death-Related Anxiety Associated With Riskier Decision-Making Irrespective Of Framing: A Bayesian Model Comparison, Blaine Tomkins

Psychology Faculty Publications

A commonly reported finding is that anxious individuals are less likely to make risky decisions. However, no studies have examined whether this association extends to death-related anxiety. The present study examined how groups low, moderate, and high in death-related anxiety make decisions with varying levels of risk. Participants completed a series of hypothetical bets in which the probability of a win was systematically manipulated. High-anxiety individuals displayed the greatest risk-taking behavior, followed by the moderate-anxiety group, with the low-anxiety group being most risk-averse. Experiment 2 tested this association further by framing outcomes in terms of losses, rather than gains. A …


A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky Jan 2022

A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky

Basic Science Engineering

In this article, we introduce a simple algorithm to generating a new type-II progressive censoring scheme for two samples. It is observed that the proposed algorithm can be applied for any continues probability distribution. Moreover, the description model and necessary assumptions are discussed. In addition, the steps of simple generation algorithm along with programming steps are also constructed on real example. The inference of two Weibull Frechet populations are discussed under the proposed algorithm. Both classical and Bayesian inferential approaches of the distribution parameters are discussed. Furthermore, approximate confidence intervals are constructed based on the asymptotic distribution of the maximum …


Behavioral Predictive Analytics Towards Personalization For Self-Management – A Use Case On Linking Health-Related Social Needs, Bon Sy, Michael Wassil, Helene Connelly, Alisha Hassan Jan 2022

Behavioral Predictive Analytics Towards Personalization For Self-Management – A Use Case On Linking Health-Related Social Needs, Bon Sy, Michael Wassil, Helene Connelly, Alisha Hassan

Publications and Research

The objective of this research is to investigate the feasibility of applying behavioral predictive analytics to optimize patient engagement in diabetes self-management, and to gain insights on the potential of infusing a chatbot with NLP technology for discovering health-related social needs. In the U.S., less than 25% of patients actively engage in self-health management even though self-health management has been reported to associate with improved health outcomes and reduced healthcare costs. The proposed behavioral predictive analytics relies on manifold clustering to identify subpopulations segmented by behavior readiness characteristics that exhibit non-linear properties. For each subpopulation, an individualized auto-regression model and …


Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray Dec 2021

Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray

Department of Statistics: Dissertations, Theses, and Student Work

Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …


Science Is For Everybody: A Resource For Understanding Glaciers, Climate, And Modeling, Emma Watson Oct 2021

Science Is For Everybody: A Resource For Understanding Glaciers, Climate, And Modeling, Emma Watson

Independent Study Project (ISP) Collection

Climate change threatens the existence of glaciers worldwide. In order to properly interact with these changing systems, we must first understand them. Glacial models provide an excellent way to do this; however, the language and mathematical concepts used in their creation is generally inaccessible to a common audience. This project presents an online resource for a general audience to interact with climate science, glaciology, and glacial modeling. Long term goals for the project include the incorporation of a glacial model of Drangajökull, Vestfirðir, NW Iceland. As such, focus for the project includes a literature review of glaciers, Drangajökull in particular, …


Exploring The Relationship Between Mandatory Helmet Use Regulations And Adult Cyclists’ Behavior In California Using Hybrid Machine Learning Models, Fatemeh Davoudi Kakhki, Maria Chierichetti Oct 2021

Exploring The Relationship Between Mandatory Helmet Use Regulations And Adult Cyclists’ Behavior In California Using Hybrid Machine Learning Models, Fatemeh Davoudi Kakhki, Maria Chierichetti

Mineta Transportation Institute Publications

In California, bike fatalities increased by 8.1% from 2015 to 2016. Even though the benefits of wearing helmets in protecting cyclists against trauma in cycling crash has been determined, the use of helmets is still limited, and there is opposition against mandatory helmet use, particularly for adults. Therefore, exploring perceptions of adult cyclists regarding mandatory helmet use is a key element in understanding cyclists’ behavior, and determining the impact of mandatory helmet use on their cycling rate. The goal of this research is to identify sociodemographic characteristics and cycling behaviors that are associated with the use and non-use of bicycle …


Spatial Analysis Of Landscape Characteristics, Anthropogenic Factors, And Seasonality Effects On Water Quality In Portland, Oregon, Katherine Gelsey, Daniel Ramirez Aug 2021

Spatial Analysis Of Landscape Characteristics, Anthropogenic Factors, And Seasonality Effects On Water Quality In Portland, Oregon, Katherine Gelsey, Daniel Ramirez

REU Final Reports

Urban areas often struggle with deteriorated water quality as a result of complex interactions between landscape factors such as land cover, use, and management as well as climatic variables such as weather, precipitation, and atmospheric conditions. Green stormwater infrastructure (GSI) has been introduced as a strategy to reintroduce pre-development hydrological conditions in cities, but questions remain as to how GSI interacts with other landscape factors to affect water quality. We conducted a statistical analysis of six relevant water quality indicators in 131 water quality stations in four watersheds around Portland, Oregon using data from 2015 to 2021. Indiscriminate of station …


Modeling Covid-19 Spread In Small Colleges, Riti Bahl, Nicole Eikmeier, Alexandra Fraser, Matthew Junge, Felicia Keesing, Kukai Nakahata, Lily Reeves Aug 2021

Modeling Covid-19 Spread In Small Colleges, Riti Bahl, Nicole Eikmeier, Alexandra Fraser, Matthew Junge, Felicia Keesing, Kukai Nakahata, Lily Reeves

Publications and Research

We develop an agent-based model on a network meant to capture features unique to COVID-19 spread through a small residential college. We find that a safe reopening requires strong policy from administrators combined with cautious behavior from students. Strong policy includes weekly screening tests with quick turnaround and halving the campus population. Cautious behavior from students means wearing facemasks, socializing less, and showing up for COVID-19 testing. We also find that comprehensive testing and facemasks are the most effective single interventions, building closures can lead to infection spikes in other areas depending on student behavior, and faster return of test …


Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh May 2021

Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh

Publications and Research

Brownian Motion which is also considered to be a Wiener process and can be thought of as a random walk. In our project we had briefly discussed the fluctuations of financial indices and related it to Brownian Motion and the modeling of Stock prices.


Species In Vernal Pools: Anova, Lisa Manne May 2021

Species In Vernal Pools: Anova, Lisa Manne

Open Educational Resources

A one-way analysis of variance exercise using data on species diversities from vernal pools.Data are from vernal pools in Willowbrook Park (adjacent to College of Staten Island's campus) in spring.

The typical ANOVA gives a straightforward result (significant anova, easily-interpreted Tukey-Kramer analysis). This data set requires more nuanced interpretation, as the ANOVA is marginally significant, and Tukey-Kramer yields one significant pairwise comparison between groups. Relative lack of variation within groups explains this apparent enigma.


Lecture 04: Spatial Statistics Applications Of Hrl, Trl, And Mixed Precision, David Keyes Apr 2021

Lecture 04: Spatial Statistics Applications Of Hrl, Trl, And Mixed Precision, David Keyes

Mathematical Sciences Spring Lecture Series

As simulation and analytics enter the exascale era, numerical algorithms, particularly implicit solvers that couple vast numbers of degrees of freedom, must span a widening gap between ambitious applications and austere architectures to support them. We present fifteen universals for researchers in scalable solvers: imperatives from computer architecture that scalable solvers must respect, strategies towards achieving them that are currently well established, and additional strategies currently being developed for an effective and efficient exascale software ecosystem. We consider recent generalizations of what it means to “solve” a computational problem, which suggest that we have often been “oversolving” them at the …


A Probabilistic Approach To Identifying Run Scoring Advantage In The Order Of Playing Cricket, Manar D. Samad, Sumen Sen Mar 2021

A Probabilistic Approach To Identifying Run Scoring Advantage In The Order Of Playing Cricket, Manar D. Samad, Sumen Sen

Computer Science Faculty Research

In the game of cricket, the decision to bat first after winning the toss is often taken to make the best use of superior pitch conditions and set a big target for the opponent. However, the opponent may fail to show their natural batting performance in the second innings due to several factors, including deteriorated pitch conditions and excessive pressure of chasing a high target score. The advantage of batting first has been highlighted in the literature and expert opinions. However, the effect of batting and bowling order on match outcome has not been investigated well enough to recommend an …


Regression Analyses Assessing The Impact Of Environmental Factors On Covid-19 Transmission And Mortality, El Hussain Shamsa, Kezhong Zhang Feb 2021

Regression Analyses Assessing The Impact Of Environmental Factors On Covid-19 Transmission And Mortality, El Hussain Shamsa, Kezhong Zhang

Medical Student Research Symposium

No abstract provided.


Novel Statistical Analysis In The Context Of A Comprehensive Needs Assessment For Secondary Stem Recruitment, Norou Diawara, Sarah Ferguson, Melva Grant, Kumer Das Jan 2021

Novel Statistical Analysis In The Context Of A Comprehensive Needs Assessment For Secondary Stem Recruitment, Norou Diawara, Sarah Ferguson, Melva Grant, Kumer Das

Mathematics & Statistics Faculty Publications

There is a myriad of career opportunities stemming from science, technology, engineering, and mathematics (STEM) disciplines. In addition to careers in corporate settings, teaching is a viable career option for individuals pursuing degrees in STEM disciplines. With national shortages of secondary STEM teachers, efforts to recruit, train, and retain quality STEM teachers is greatly important. Prior to exploring ways to attract potential STEM teacher candidates to pursue teacher training programs, it is important to understand the perceived value that potential recruits place on STEM careers, disciplines, and the teaching profession. The purpose of this study was to explore students’ perceptions …


The Need To Incorporate Communities In Compartmental Models, Michael J. Kane, Owais Gilani Jan 2021

The Need To Incorporate Communities In Compartmental Models, Michael J. Kane, Owais Gilani

Faculty Journal Articles

Tian et al. provide a framework for assessing population- level interventions of disease outbreaks through the construction of counterfactuals in a large-scale, natural experiment assessing the efficacy of mild, but early interventions compared to delayed interventions. The technique is applied to the recent SARS-CoV-2 outbreak with the population of Shenzhen, China acting as the mild-but-early treatment group and a combination of several US counties resembling Shenzhen but enacting a delayed intervention acting as the control. To help further the development of this framework and identify an avenue for further enhancement, we focus on the use and potential limitations of compartmental …