Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

11,764 Full-Text Articles 18,434 Authors 6,525,218 Downloads 279 Institutions

All Articles in Statistics and Probability

Faceted Search

11,764 full-text articles. Page 1 of 403.

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy 2023 Southern Methodist University

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy

SMU Data Science Review

American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …


Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason McDonald, Tamas Toth, Bivin Sadler 2023 Southern Methodist University

Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler

SMU Data Science Review

In recent years, various new Machine Learning and Deep Learning algorithms have been introduced, claiming to offer better performance than traditional statistical approaches when forecasting time series. Studies seeking evidence to support the usage of ML/DL over statistical approaches have been limited to comparing the forecasting performance of univariate, linear time series data. This research compares the performance of traditional statistical-based and ML/DL methods for forecasting multivariate and nonlinear time series.


A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda 2023 Southern Methodist University

A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda

SMU Data Science Review

Statistical models in time series forecasting have long been challenged to be superseded by the advent of deep learning models. This research proposes a new hybrid ensemble of forecasting models that combines the strengths of several strong candidates from these two model types. The proposed ensemble aims to improve the accuracy of forecasts and reduce computational complexity by leveraging the strengths of each candidate model.


Genetic Associations Of Alzheimer’S Disease And Mild Cognitive Impairment, Scott Hebert 2023 University of Massachusetts Amherst

Genetic Associations Of Alzheimer’S Disease And Mild Cognitive Impairment, Scott Hebert

Masters Theses

Over 6 million people are estimated to have been living with Alzheimer’s Disease (AD) in 2020, with another 12 million living with Mild Cognitive Impairment (MCI). Research has been conducted to evaluate genetic links to AD, but more research is needed on the subject. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) has been conducting a longitudinal study of AD and MCI since 2004 and offering their data to research teams around the world. Diagnostic and demographic data was collected from participants, as well as data regarding single nucleotide polymorphisms (SNPs). SNP data was transformed to a binary format regarding whether the …


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross 2023 University of Massachusetts Amherst

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Indirect Aggression And Victimization: Investigating Instrument Psychometrics, Gender Differences, And Its Relationship To Social Information Processing, Taylor Steeves 2023 Duquesne University

Indirect Aggression And Victimization: Investigating Instrument Psychometrics, Gender Differences, And Its Relationship To Social Information Processing, Taylor Steeves

Electronic Theses and Dissertations

The study of indirect bullying behaviors, relational aggression and social aggression, has been of theoretical importance and interest to researchers and psychologists within the last few decades. In this investigation, using a convenience sample of 451 late adolescents attending a private university in the mid-Atlantic U.S., I examined the factor structure of two measures of indirect bullying, the Young Adult Social Behavior Scale – Victim (YASB-V) and the Young Adult Social Behavior Scale – Perpetrator (YASB-P). Using confirmatory factor analysis (CFA), I found that the YASB-V comprised a four-factor model, differing from the model that had been identified in the …


Comparing Elevator Strategies For A Parking Lot, Naveed Arafat 2023 University of Windsor

Comparing Elevator Strategies For A Parking Lot, Naveed Arafat

Major Papers

In this paper, we compare elevator strategies for a parking garage. It is assumed that the parking garage has several floors and there is an elevator which can stop on each floor. We begin by considering 4 strategies detailed in page 23. For each strategy, we loop the program 100 times, and get 100 mean values for wait times. Welch's test confirms highly significant differences among the 4 strategies. Repeating the analysis multiple times we see that the best of the 4 strategies is strategy 2, which places the elevator on floor 2 (the median floor) after use.


Excess Zeros Under Gam: Tweedie Or Two-Part?, Xianming Zeng 2023 University of Windsor

Excess Zeros Under Gam: Tweedie Or Two-Part?, Xianming Zeng

Major Papers

Positive, right-skewed data with excess zeros are encountered in many real-life situations. Two possible techniques to analyze this type of data are: Two-part models and Tweedie models. The two-part models assume existence of a separate zero generating process, while the Tweedie models are based on distributions that allow mass at zero. The paper aims to present a simulation study to investigate the performance of Generalized Additive Models (GAM) under the distribution of Tweedie and two-part models for such data with excess zero by using MSE (Mean Square Error) and relative bias to compare the performance of both methods. We found …


The "Benfordness" Of Bach Music, Chadrack Bantange, Darby Burgett, Luke Haws, Sybil Prince Nelson 2023 Washington and Lee University

The "Benfordness" Of Bach Music, Chadrack Bantange, Darby Burgett, Luke Haws, Sybil Prince Nelson

Journal of Humanistic Mathematics

In this paper we analyze the distribution of musical note frequencies in Hertz to see whether they follow the logarithmic Benford distribution. Our results show that the music of Johann Sebastian Bach and Johann Christian Bach is Benford distributed while the computer-generated music is not. We also find that computer-generated music is statistically less Benford distributed than human- composed music.


Math And Democracy, Kimberly A. Roth, Erika L. Ward 2023 Juniata College

Math And Democracy, Kimberly A. Roth, Erika L. Ward

Journal of Humanistic Mathematics

Math and Democracy is a math class containing topics such as voting theory, weighted voting, apportionment, and gerrymandering. It was first designed by Erika Ward for math master’s students, mostly educators, but then adapted separately by both Erika Ward and Kim Roth for a general audience of undergraduates. The course contains materials that can be explored in mathematics classes from those for non-majors through graduate students. As such, it serves students from all majors and allows for discussion of fairness, racial justice, and politics while exploring mathematics that non-major students might not otherwise encounter. This article serves as a guide …


Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik 2023 University of Nebraska - Lincoln

Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik

Dissertations and Theses in Statistics

The gut microbiome plays a crucial role in human health, and by working collaboratively with microbiologists, we aim to further our understanding of the human gut and its impact on human health. Promoting a diverse microbiome is emphasized throughout microbiology literature, and involving a statistician in designing experiments to relate gut bacteria and some measured health outcome is crucial for ensuring valid and accurate results. By adopting new experimental design and analysis methods, researchers can begin to gain a deeper understanding of how the genetics of our food affect the composition of taxa within the gut microbiome. This dissertation is …


An Interval-Valued Random Forests, Paul Gaona Partida 2023 Utah State University

An Interval-Valued Random Forests, Paul Gaona Partida

All Graduate Theses and Dissertations

There is a growing demand for the development of new statistical models and the refinement of established methods to accommodate different data structures. This need arises from the recognition that traditional statistics often assume the value of each observation to be precise, which may not hold true in many real-world scenarios. Factors such as the collection process and technological advancements can introduce imprecision and uncertainty into the data.

For example, consider data collected over a long period of time, where newer measurement tools may offer greater accuracy and provide more information than previous methods. In such cases, it becomes crucial …


Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock 2023 Utah State University

Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock

All Graduate Theses and Dissertations

Many discipline specific researchers need a way to quickly compare the accuracy of their predictive models to other alternatives. However, many of these researchers are not experienced with multiple programming languages. Python has recently been the leader in machine learning functionality, which includes the PyCaret library that allows users to develop high-performing machine learning models with only a few lines of code. The goal of the stressor package is to help users of the R programming language access the advantages of PyCaret without having to learn Python. This allows the user to leverage R’s powerful data analysis workflows, while simultaneously …


Geometric Morphometric Analysis Of Modern Viperid Vertebrae Facilitates Identification Of Fossil Specimens, Lance D. Jessee 2023 East Tennessee State University

Geometric Morphometric Analysis Of Modern Viperid Vertebrae Facilitates Identification Of Fossil Specimens, Lance D. Jessee

Electronic Theses and Dissertations

Snake vertebrae are common in the fossil record, whereas cranial remains are generally fragile and rare. Consequently, vertebrae are the most commonly studied fossil element of snakes. However, identification of snake vertebrae can be problematic due to extensive variation. This study utilizes 2-D geometric morphometrics and canonical variates analysis to 1) reveal variation between genera and species and 2) classify vertebrae of modern and fossil eastern North American Agkistrodon and Crotalus. The results show that vertebrae of Agkistrodon and Crotalus can reliably be classified to genus and species using these methods. Based on the statistical analyses, four of the …


Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle 2023 Utah State University

Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle

All Graduate Theses and Dissertations

Graphical software packages have become increasingly popular in our modern world, but there are concerns within the statistical visualization field about the default settings provided by these packages, which can make it challenging to create good quality graphs that align with standard graph principles. In this thesis, we investigate whether the quality of graphs from Utah State University (USU) Plan A Master of Science (MS) thesis reports from the years 1930 to 2019 was affected by the rise of graphical software packages. We collected all data stored on the USU Digital Commons website since November 2021 to determine the specific …


Using Natural Language Processing To Quantify The Efficacy Of Language Simplification As A Communication Strategy, Brian Nalley 2023 Utah State University

Using Natural Language Processing To Quantify The Efficacy Of Language Simplification As A Communication Strategy, Brian Nalley

All Graduate Theses and Dissertations

People with communication disorders often experience difficulties being understood by unfamiliar listeners or in noisy environments. A common strategy for effectively communicating in these scenarios is to use simpler and more predictable language. Despite the prevalence of this strategy, there has been little to no research to date focused on the effectiveness of language simplification as a communication strategy. This study seeks to begin filling that gap by using natural language processing to determine whether speakers with early-stage Parkinson’s disease and age-matched neurotypical speakers are able to successfully simplify their language while still maintaining the original message.

Simplification was measured …


Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove 2023 Ursinus College

Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove

Mathematics Summer Fellows

This study examines the change in connotative language use before and during the Covid-19 pandemic. By analyzing news articles from several major US newspapers, we found that there is a statistically significant correlation between the sentiment of the text and the publication period. Specifically, we document a large, systematic, and statistically significant decline in the overall sentiment of articles published in major news outlets. While our results do not directly gauge the sentiment of the population, our findings have important implications regarding the social responsibility of journalists and media outlets especially in times of crisis.


A Multivariate Investigation Of The Motivational, Academic, And Well-Being Characteristics Of First-Generation And Continuing-Generation College Students, Christopher L. Thomas, Staci Zolkoski 2023 The University of Texas at Tyler

A Multivariate Investigation Of The Motivational, Academic, And Well-Being Characteristics Of First-Generation And Continuing-Generation College Students, Christopher L. Thomas, Staci Zolkoski

Journal of Research Initiatives

Prior research has noted differences in motivational, academic, and well-being factors between first-generation and continuing-education students. However, past investigations have primarily overlooked the interactive influence of protective and risk factors when comparing the characteristics of first-generation and continuing-education students. Thus, the current study adopted a multivariate approach to gain a more nuanced understanding of the influence of generational status on students' self-regulated learning capabilities, academic anxiety, sense of belonging, academic barriers, mental health concerns, and satisfaction with life. University students (N = 432, 67.46% Caucasian, 87.55% female, Age = 28.10 ± 9.46) completed the Cognitive Test Anxiety Scale-2nd …


Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu 2023 Southern Methodist University

Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu

Statistical Science Theses and Dissertations

In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.

There are different kinds of life-testing experiments that can be applied for different purposes. …


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du 2023 Southern Methodist University

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Digital Commons powered by bepress