Stressor: An R Package For Benchmarking Machine Learning Models,
2023
Utah State University
Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock
All Graduate Theses and Dissertations
Many discipline specific researchers need a way to quickly compare the accuracy of their predictive models to other alternatives. However, many of these researchers are not experienced with multiple programming languages. Python has recently been the leader in machine learning functionality, which includes the PyCaret library that allows users to develop high-performing machine learning models with only a few lines of code. The goal of the stressor package is to help users of the R programming language access the advantages of PyCaret without having to learn Python. This allows the user to leverage R’s powerful data analysis workflows, while simultaneously …
Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports,
2023
Utah State University
Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle
All Graduate Theses and Dissertations
Graphical software packages have become increasingly popular in our modern world, but there are concerns within the statistical visualization field about the default settings provided by these packages, which can make it challenging to create good quality graphs that align with standard graph principles. In this thesis, we investigate whether the quality of graphs from Utah State University (USU) Plan A Master of Science (MS) thesis reports from the years 1930 to 2019 was affected by the rise of graphical software packages. We collected all data stored on the USU Digital Commons website since November 2021 to determine the specific …
On Image Response Regression With High-Dimensional Data,
2023
University of Windsor
On Image Response Regression With High-Dimensional Data, Noah Fuerth
Major Papers
A recent issue in statistical analysis is modelling data when the effect variable
changes at different locations. This can be difficult to accomplish when the dimensions
of the covariates are very high, and when the domain of the varying coefficient
functions of predictors are not necessarily regular. This research paper will investigate
a method to overcome these challenges by approximating the varying coefficient
functions using bivariate splines. We do this by splitting the domain of the varying
coefficient functions into a number of triangles, and build the bivariate spline functions
based on this triangulation. This major paper will outline detailed …
Analytical Approach For Monitoring The Behavior Of Patients With Pancreatic Adenocarcinoma At Different Stages As A Function Of Time,
2023
Eastern Virginia Medical School
Analytical Approach For Monitoring The Behavior Of Patients With Pancreatic Adenocarcinoma At Different Stages As A Function Of Time, Aditya Chakaborty Dr, Chris P. Tsokos Dr
Biology and Medicine Through Mathematics Conference
No abstract provided.
Predicting Dengue Incidence In Central Argentina Using Google Trends Data,
2023
Instituto de Investigaciones Biológicas y Tecnológicas, CONICET-Universidad Nacional de Córdoba, Centro de Investigaciones Entomológicas de Córdoba, Córdoba, Argentina
Predicting Dengue Incidence In Central Argentina Using Google Trends Data, Sahil Chindal, Elizabet Estallo, Yanjun Qian, Michael Robert
Biology and Medicine Through Mathematics Conference
No abstract provided.
Public Acceptance Of Guidance And Regulations For Space Flight Participation,
2023
Embry-Riddle Aeronautical University
Public Acceptance Of Guidance And Regulations For Space Flight Participation, Cory Trunkhill, Robert Joslin, Joseph Keebler
Journal of Aviation Technology and Engineering
Space flight participants are not professional astronauts and not subject to the rules and guidance covering space flight crewmembers. Ordinal logistic regression of survey data was utilized to explore public acceptance of current medical screening recommendations and regulations for safety risk and implied liability for civil space flight participation. Independent variables constituted participant demographic representations while dependent variables represented current Federal Aviation Administration guidance and regulations. Odds ratios were derived based on the demographic categories to interpret likelihood of acceptance for the criteria. Significant likely acceptance of guidance and regulations was found for five of twelve demographic variables influencing public …
Evaluating Models Of Scanpath Prediction,
2023
University of Tübingen
Evaluating Models Of Scanpath Prediction, Matthias Kümmerer, Matthias Bethge
MODVIS Workshop
No abstract provided.
Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning,
2023
Southern Methodist University
Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile
Statistical Science Theses and Dissertations
Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …
Movie Recommender System Using Matrix Factorization,
2023
University of Central Florida
Movie Recommender System Using Matrix Factorization, Roland Fiagbe
Data Science and Data Mining
Recommendation systems are a popular and beneficial field that can help people make informed decisions automatically. This technique assists users in selecting relevant information from an overwhelming amount of available data. When it comes to movie recommendations, two common methods are collaborative filtering, which compares similarities between users, and content-based filtering, which takes a user’s specific preferences into account. However, our study focuses on the collaborative filtering approach, specifically matrix factorization. Various similarity metrics are used to identify user similarities for recommendation purposes. Our project aims to predict movie ratings for unwatched movies using the MovieLens rating dataset. We developed …
A Probabilistic Exploration Of Food Supplementation And Assistance,
2023
Murray State University
A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly
Honors College Theses
Food insecurity is a stark threat that grips our country and affects households throughout our country. Dietary insufficiency manifests itself in ways that affect health and public safety. According to researchers, individuals who suffer from food insecurity have a higher risk of aggression, anxiety, suicide ideation and depression. These problems tend to occur unequally distributed among those households with lower income. In this work, an exploratory analysis within these data sets will be performed to examine the socio-economic, biographical, nutritional, and geographical principal components of food insecurity among survey participants and how the US Supplemental Nutrition Assistance Program (SNAP) effects …
Hispanic Human Capital And Financial Aid Application In The West Census Region,
2023
California State University, Monterey Bay
Hispanic Human Capital And Financial Aid Application In The West Census Region, Benjamin Lundy-Paine
Capstone Projects and Master's Theses
As of 2021, very few Hispanic residents in the United States held a college degree in comparison to non-Hispanic residents. Research has shown that, particularly for Hispanic students, financial aid increases college persistence. Hispanic Free Application for Federal Student Aid (FAFSA) submission rates rank among the lowest, preventing many Hispanic students from receiving financial assistance. This issue is most prevalent West Census Region (WCR), where there is the highest concentration of Hispanic residents. To understand what barriers may be preventing Hispanic submission in the WCR this Capstone used logistic regression models to analyze student-level data from the National Center for …
Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey,
2023
Liberty University
Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer
Senior Honors Theses
As research into hockey analytics continues, an increasing number of metrics are being introduced into the knowledge base of the field, creating a need to determine whether various stats are useful or simply add noise to the discussion. This paper examines microstatistics – manually tracked metrics which go beyond the NHL’s publicly released stats – both through the lens of meta-analytics (which attempt to objectively assess how useful a metric is) and modeling game probabilities. Results show that while there is certainly room for improvement in understanding and use of microstats in modeling, the metrics overall represent an area of …
Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change,
2023
Seattle Pacific University
Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe
Honors Projects
The game of baseball has changed a lot in the past twenty years. It can be primarily attributed to the explosion in data analytics and how they are used to evaluate baseball players. This led to different player profiles being preferred and eventually led to the development of players changing. As a result, the strategies employed have also evolved and turned into a different game than seen only a couple of decades ago. This paper will explore the changes that the game has seen. On the other hand, Major League Baseball has also implemented its own changes to try and …
Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients,
2023
Murray State University
Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress
Honors College Theses
Glaucoma is a group of eye diseases in which damage gradually occurs to the optic nerve, which often leads to partial or complete loss of vision. As the second leading cause of blindness, there is no cure for glaucoma. Early detection and the tracking of its progression is key to managing the effects of glaucoma. Ordinary Least Squares Regression (OLSR), the most commonly used methodology for tracking glaucoma progression, is inappropriate as the longitudinally collected perimetry data from the glaucoma patients appears to be temporally correlated. Time series models, that account for temporal correlation, are better methods to analyze Mean …
Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists,
2023
Kennesaw State University
Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, Graham Nash
Symposium of Student Scholars
Employee attrition is a relevant issue that every business employer must consider when gauging the effectiveness of their employees. Whether or not an employee chooses to leave their job can come from a multitude of factors. As a result, employers need to develop methods in which they can measure attrition by calculating the several qualities of their employees. Factors like their age, years with the company, which department they work in, their level of education, their job role, and even their marital status are all considered by employers to assist in predicting employee attrition. This project will be analyzing a …
Using A Distributive Approach To Model Insurance Loss,
2023
University of Mary Washington
Using A Distributive Approach To Model Insurance Loss, Kayla Kippes
Student Research Submissions
Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …
Statistical Approach To Quantifying Interceptability Of Interaction Scenarios For Testing Autonomous Surface Vessels,
2023
Old Dominion University
Statistical Approach To Quantifying Interceptability Of Interaction Scenarios For Testing Autonomous Surface Vessels, Benjamin E. Hargis, Yiannis E. Papelis
Modeling, Simulation and Visualization Student Capstone Conference
This paper presents a probabilistic approach to quantifying interceptability of an interaction scenario designed to test collision avoidance of autonomous navigation algorithms. Interceptability is one of many measures to determine the complexity or difficulty of an interaction scenario. This approach uses a combined probability model of capability and intent to create a predicted position probability map for the system under test. Then, intercept-ability is quantified by determining the overlap between the system under test probability map and the intruder’s capability model. The approach is general; however, a demonstration is provided using kinematic capability models and an odometry-based intent model.
Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing,
2023
Southern Methodist University
Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater
SMU Data Science Review
A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.
Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties,
2023
Southern Methodist University
Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia
SMU Data Science Review
Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.
A New Generalized Gamma-Weibull Distribution And Its Applications,
2023
Department of Mathematics and Statistics, Kwara State University, Malete P.M.B. 1530, Ilorin, Nigeria
A New Generalized Gamma-Weibull Distribution And Its Applications, Nihimat Iyebuhola Aleshinloye, Samuel Adewale Aderoju, Alfred Adewole Abiodun, Bako Lukmon Taiwo
Al-Bahir Journal for Engineering and Pure Sciences
In this paper, a New Generalized Gamma-Weibull (NGGW) distribution is developed by compounding Weibull and generalized gamma distribution. Some mathematical properties such as moments, Rényi entropy and order statistics are derived and discussed. The maximum likelihood estimation (MLE) method is used to estimate the model parameters. The proposed model is applied to two real-life datasets to illustrate its performance and flexibility as compared to some other competing distributions. The results obtained show that the new distribution fits each of the data better than the other competing distributions.
