Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Utah State University

Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 273

Full-Text Articles in Physical Sciences and Mathematics

Exploring Application Of The Coordinate Exchange To Generate Optimal Designs Robust To Data Loss, Asher Hanson May 2024

Exploring Application Of The Coordinate Exchange To Generate Optimal Designs Robust To Data Loss, Asher Hanson

All Graduate Theses and Dissertations, Fall 2023 to Present

The primary objective of this study is to evaluate the efficacy of the coordinate exchange (CEXCH) algorithm in the generation of robust optimal designs. The assessment involves a comparative analysis, wherein designs produced by the Point Exchange (PEXCH) Algorithm are employed as benchmarks for evaluating the efficiency of CEXCH designs. Three modified criteria, selected from the traditional alphabet criteria pool, are utilized to score each algorithm. To enhance the reliability of the comparative analysis, multiple rounds of validation are conducted, focusing on visual assessments, design scores, and criteria efficiencies. The findings from each round of validation contribute to a comprehensive …


Exploring Optimal Design Of Experiments For Random Effects Models, Ryan C. Bushman May 2024

Exploring Optimal Design Of Experiments For Random Effects Models, Ryan C. Bushman

All Graduate Theses and Dissertations, Fall 2023 to Present

The majority of research in the field of optimal design of experiments has focused on producing designs for fixed effects models. The purpose of this thesis is to explore how the optimal design framework applies to nested random effects models. The object that is being optimized is the model information matrix. We explore the full derivation of the random effects information matrix to highlight the complexity of the problem and show how the optimization is a function of the model's parameters. In conjunction with this research, the ODVC (Optimal Design for Variance Components) package was built to provide tools that …


Teaching Reproducibility To First Year College Students: Reflections From An Introductory Data Science Course, Brennan L. Bean Dec 2023

Teaching Reproducibility To First Year College Students: Reflections From An Introductory Data Science Course, Brennan L. Bean

Journal on Empowering Teaching Excellence

Access the online Pressbooks version of this article here.

Modern technology threatens traditional modes of classroom assessment by providing students with automated ways to write essays and take exams. At the same time, modern technology continues to expand the accessibility of computational tools that promise to increase the potential scope and quality of class projects. This paper presents a case study where students are asked to complete a “reproducible” final project in an introductory data science course using the R programming language. A reproducible project is one where an instructor can easily regenerate the results and conclusions from the submitted …


Using Gamification To Foster Student Resilience And Motivation To Learn, And Using Games To Teach Significance Testing Concepts In The Statistics Classroom, Todd Partridge Dec 2023

Using Gamification To Foster Student Resilience And Motivation To Learn, And Using Games To Teach Significance Testing Concepts In The Statistics Classroom, Todd Partridge

All Graduate Theses and Dissertations, Fall 2023 to Present

Two studies are outlined in this dissertation.

In the first study, elements of Super Mario Bros. videos games were used to change the way college students in a beginners’ statistics course were graded on their work. This was part of an effort to help students remain optimistic in the face of challenging coursework and even failure on assignments and tests. The study shows that the changes made to the grading structure did help students to keep trying and to use the materials given to them by their professor until they achieved their desired grade in the course, and suggests ways …


An Ensemble Approach For Mapping Snow Water Equivalent In Utah, Logan Schneider Dec 2023

An Ensemble Approach For Mapping Snow Water Equivalent In Utah, Logan Schneider

All Graduate Theses and Dissertations, Fall 2023 to Present

Mountain snowpack is an important resource for water management planning in Utah. Snow water equivalent (SWE) is the amount of water contained in a snowpack. A few organizations predict SWE throughout the United States but struggle making accurate predictions in mountainous regions. Weather stations provide accurate measurements of SWE but have limited spatial coverage that hinders the ability to make accurate estimates statewide. This thesis examines the accuracy of current models and proposes using local weather measurements to improve upon national level predictions. An R statistical software package named rsnodas implements this process while allowing the public access to a …


Using Natural Language Processing To Quantify The Efficacy Of Language Simplification As A Communication Strategy, Brian Nalley Aug 2023

Using Natural Language Processing To Quantify The Efficacy Of Language Simplification As A Communication Strategy, Brian Nalley

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

People with communication disorders often experience difficulties being understood by unfamiliar listeners or in noisy environments. A common strategy for effectively communicating in these scenarios is to use simpler and more predictable language. Despite the prevalence of this strategy, there has been little to no research to date focused on the effectiveness of language simplification as a communication strategy. This study seeks to begin filling that gap by using natural language processing to determine whether speakers with early-stage Parkinson’s disease and age-matched neurotypical speakers are able to successfully simplify their language while still maintaining the original message.

Simplification was measured …


Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle Aug 2023

Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Graphical software packages have become increasingly popular in our modern world, but there are concerns within the statistical visualization field about the default settings provided by these packages, which can make it challenging to create good quality graphs that align with standard graph principles. In this thesis, we investigate whether the quality of graphs from Utah State University (USU) Plan A Master of Science (MS) thesis reports from the years 1930 to 2019 was affected by the rise of graphical software packages. We collected all data stored on the USU Digital Commons website since November 2021 to determine the specific …


Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock Aug 2023

Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many discipline specific researchers need a way to quickly compare the accuracy of their predictive models to other alternatives. However, many of these researchers are not experienced with multiple programming languages. Python has recently been the leader in machine learning functionality, which includes the PyCaret library that allows users to develop high-performing machine learning models with only a few lines of code. The goal of the stressor package is to help users of the R programming language access the advantages of PyCaret without having to learn Python. This allows the user to leverage R’s powerful data analysis workflows, while simultaneously …


An Interval-Valued Random Forests, Paul Gaona Partida Aug 2023

An Interval-Valued Random Forests, Paul Gaona Partida

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

There is a growing demand for the development of new statistical models and the refinement of established methods to accommodate different data structures. This need arises from the recognition that traditional statistics often assume the value of each observation to be precise, which may not hold true in many real-world scenarios. Factors such as the collection process and technological advancements can introduce imprecision and uncertainty into the data.

For example, consider data collected over a long period of time, where newer measurement tools may offer greater accuracy and provide more information than previous methods. In such cases, it becomes crucial …


Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion May 2023

Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Design of Experiments (DoE) is the field of statistics concerned with helping researchers maximize the amount of information they gain from their experiments. Recently, researchers have been turning to optimal experimental designs instead of classical/catalog experimental designs. One of the most popular algorithms used today to generate optimal designs is the Coordinate Exchange (CEXCH) Algorithm. CEXCH is known to be a greedy algorithm, which means it tends to favor immediate, locally best designs instead of globally optimal designs. Previous research demonstrated that this tradeoff was efficacious in that it reduced the cost of a single run of CEXCH and allowed …


Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas May 2023

Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this …


Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum May 2023

Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A recent example of political violence in the United States was that of the January 6, 2021, Capitol attack in connection with the certification of Joseph R. Biden’s victory over Donald J. Trump in the 2020 US presidential election. This thesis analyzes the events of January 6, 2021, through the lens of social media discourse. This thesis presents a workflow that acquired over 5 million 8kun and Reddit posts from various apolitical and political forums in the three months preceding and following the Capitol attack on January 6, 2021. Techniques from text analysis are then used to group forums according …


Supplementary Files For "Adaptive Mapping Of Design Ground Snow Loads In The Conterminous United States", Jadon Wagstaff, Jesse Wheeler, Brennan Bean, Marc Maguire, Yan Sun Jan 2023

Supplementary Files For "Adaptive Mapping Of Design Ground Snow Loads In The Conterminous United States", Jadon Wagstaff, Jesse Wheeler, Brennan Bean, Marc Maguire, Yan Sun

Browse all Datasets

Recent amendments to design ground snow load requirements in ASCE 7-22 have reduced the size of case study regions by 91% from what they were in ASCE 7-16, primarily in western states. This reduction is made possible through the development of highly accurate regional generalized additive regression models (RGAMs), stitched together with a novel smoothing scheme implemented in the R software package remap, to produce the continental- scale maps of reliability-targeted design ground snow loads available in ASCE 7-22. This approach allows for better characterizations of the changing relationship between temperature, elevation, and ground snow loads across the Conterminous United …


Power Approximations For Generalized Linear Mixed Models In R Using Steep Priors On Variance Components, Sydney Geisler Dec 2022

Power Approximations For Generalized Linear Mixed Models In R Using Steep Priors On Variance Components, Sydney Geisler

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

When designing an experiment, researchers often want to know how likely they are to detect statistically significant effects in the resulting data, i.e., they want to estimate their statistical power. The probability distribution method is a flexible way to do this, and it is currently implemented in the statistical software package SAS. This method requires a hypothetical data set (showing the magnitude of hypothesized effects) and constant values of variance components, which are critical elements of the statistical models used. The statistical software package R is increasingly popular, but the probability distribution method has not yet been implemented in R, …


Statistical Challenges And Methods For Missing And Imbalanced Data, Rose Adjei Dec 2022

Statistical Challenges And Methods For Missing And Imbalanced Data, Rose Adjei

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Missing data remains a prevalent issue in every area of research. The impact of missing data, if not carefully handled, can be detrimental to any statistical analysis. Some statistical challenges associated with missing data include, loss of information, reduced statistical power and non-generalizability of findings in a study. It is therefore crucial that researchers pay close and particular attention when dealing with missing data. This multi-paper dissertation provides insight into missing data across different fields of study and addresses some of the above mentioned challenges of missing data through simulation studies and application to real datasets. The first paper of …


Quantum Computing Simulation Of The Hydrogen Molecule System With Rigorous Quantum Circuit Derivations, Yili Zhang Aug 2022

Quantum Computing Simulation Of The Hydrogen Molecule System With Rigorous Quantum Circuit Derivations, Yili Zhang

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Quantum computing has been an emerging technology in the past few decades. It utilizes the power of programmable quantum devices to perform computation, which can solve complex problems in a feasible time that is impossible with classical computers. Simulating quantum chemical systems using quantum computers is one of the most active research fields in quantum computing. However, due to the novelty of the technology and concept, most materials in the literature are not accessible for newbies in the field and sometimes can cause ambiguity for practitioners due to missing details.

This report provides a rigorous derivation of simulating quantum chemistry …


A Bayesian Hierarchical Approach For Modeling Virtual Species With Realistic Functional Trait Relationships, Sarah Bogen Aug 2022

A Bayesian Hierarchical Approach For Modeling Virtual Species With Realistic Functional Trait Relationships, Sarah Bogen

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Understanding the spatial and temporal dynamics of plant populations has important implications for the fields of ecology and conservation. A rich body of mathematical modeling approaches, including reaction-diffusion equations and integrodifference equations, have been developed to mechanistically model population spread based on species demography and seed dispersal characteristics. However, with over 390,000 plant species on Earth, it is not feasible to collect complete information on all species for the purpose of drawing generalized conclusions. One means of overcoming such a problem is through trait-based modeling, which seeks to represent realistic combinations of organismal traits rather than focusing on individual species. …


An Introduction To Combinatorics Via Cayley's Theorem, Jaylee Willis Aug 2022

An Introduction To Combinatorics Via Cayley's Theorem, Jaylee Willis

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In this paper, we explore some of the methods that are often used to solve combinatorial problems by proving Cayley’s theorem on trees in multiple ways. The intended audience of this paper is undergraduate and graduate mathematics students with little to no experience in combinatorics. This paper could also be used as a supplementary text for an undergraduate combinatorics course.


Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen Aug 2022

Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A major focus in statistics is building and improving computational algorithms that can use data to predict a response. Two fundamental camps of research arise from such a goal. The first camp is researching ways to get more accurate predictions. Many sophisticated methods, collectively known as machine learning methods, have been developed for this very purpose. One such method that is widely used across industry and many other areas of investigation is called Random Forests.

The second camp of research is that of improving the interpretability of machine learning methods. This is worthy of attention when analysts desire to optimize …


Geometry- And Accuracy-Preserving Random Forest Proximities With Applications, Jake S. Rhodes Aug 2022

Geometry- And Accuracy-Preserving Random Forest Proximities With Applications, Jake S. Rhodes

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many machine learning algorithms use calculated distances or similarities between data observations to make predictions, cluster similar data, visualize patterns, or generally explore the data. Most distances or similarity measures do not incorporate known data labels and are thus considered unsupervised. Supervised methods for measuring distance exist which incorporate data labels and thereby exaggerate separation between data points of different classes. This approach tends to distort the natural structure of the data. Instead of following similar approaches, we leverage a popular algorithm used for making data-driven predictions, known as random forests, to naturally incorporate data labels into similarity measures known …


Redefining Nba Basketball Positions Through Visualization And Mega-Cluster Analysis, Alexander L. Hedquist Aug 2022

Redefining Nba Basketball Positions Through Visualization And Mega-Cluster Analysis, Alexander L. Hedquist

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Basketball players have historically been classified based on one of five positions, namely Point Guards, Shooting Guards, Small Forwards, and Centers. While grouping players into these five categories may provide general descriptions of their perceived role, these standard positions fall short of describing players based on their true abilities and performance. This MS thesis proposes a method to group players of the National Basketball Association (NBA) from the past 20 seasons into more meaningful and specific player positions. We systematically group these players into nine distinct categories, and we draw from a vast array of visualization tools, techniques, and software …


Dynamic System Discovery With Recursive Physics-Informed Neural Networks, Jarrod Mau Aug 2022

Dynamic System Discovery With Recursive Physics-Informed Neural Networks, Jarrod Mau

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This thesis presents a novel method, recursive Physics informed neural network, to learn the right hand side of differential equations. The neural network takes in data, then trains, and then acts as a proxy for the differential equation which can be used for modeling. We show the theoretical superiority of the recursive approach. We also use computer simulations to demonstrate the proved properties.


Defining Areas Of Interest Using Voronoi And Modified Voronoi Tesselations To Analyze Eye-Tracking Data, Joanna D. Coltrin Aug 2022

Defining Areas Of Interest Using Voronoi And Modified Voronoi Tesselations To Analyze Eye-Tracking Data, Joanna D. Coltrin

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Eye tracking is a technology used to track where someone is looking. Eye-tracking technology is often used to study what people focus on when looking at a photo of another person. The eye-tracking technology records points on a photo that a person is looking at. When the photo being looked at shows a person, the points can be categorized by body part such as head, right hand, left hand, and torso. This thesis presents the use of partially circular areas to define the body parts of the person in the photo and therefore categorize the points collected by the eye-tracker. …


Model Averaging In Agriculture And Natural Resources: What Is It? When Is It Useful? When Is It A Distraction?, Philip M. Dixon May 2022

Model Averaging In Agriculture And Natural Resources: What Is It? When Is It Useful? When Is It A Distraction?, Philip M. Dixon

Conference on Applied Statistics in Agriculture and Natural Resources

I use two examples to illustrate three methods for model averaging: using AIC weights, using BIC weights, and fully Bayesian analyses. The first example is a capture-recapture study that estimates the population size by averaging over 4 models for capture probabilities. The second is an analysis of a study of logging impacts on Curculionid weevils using a before-after-control-impact (BACI) study design. The estimated impact is averaged over 4 ecologically relevant models.

Both examples demonstrate the sensitivity of model weights, or posterior model probabilities, to the choice of prior model probabilities and prior distributions for parameters. The model averaged estimates and …


A Robust Clustering Method Using Compositional Data Restrictions: Studying Wood Properties In The Reforestation Of Portugal, Pamela M. Chiroque-Solano, Guido A. Moreira May 2022

A Robust Clustering Method Using Compositional Data Restrictions: Studying Wood Properties In The Reforestation Of Portugal, Pamela M. Chiroque-Solano, Guido A. Moreira

Conference on Applied Statistics in Agriculture and Natural Resources

Classification of multivariate observations while preserving the data’s natural restriction is a challenge. Special properties such as identifiability, interpretability, and others need to be cared for to build a new approach. To avoid these complications, many transformation algorithms have been developed to use traditional models.In this context, the aim of this work is to propose a robust probabilistic distance algorithm to classify compositional data. Based on the probabilistic distance (PD) clustering approach, the proposal identifies clusters minimizing a joint distance function, JDF, which is part of a dissimilarity measure. This measure combines the PD clustering approach with the density of …


Random Regression For Modeling Semen Fertility In Hf Purebred And Crossbred Bulls Using A Bayesian Framework, Vrinda Ambike, R. Venkataramanan, S. M. K. Karthickeyan, K. G. Tirumurugaan, Kaustubh Bhave, M. Swaminathan May 2022

Random Regression For Modeling Semen Fertility In Hf Purebred And Crossbred Bulls Using A Bayesian Framework, Vrinda Ambike, R. Venkataramanan, S. M. K. Karthickeyan, K. G. Tirumurugaan, Kaustubh Bhave, M. Swaminathan

Conference on Applied Statistics in Agriculture and Natural Resources

Data on insemination records of Holstein Friesian (HF) purebred (n=45,497) and crossbred (n=58,497) collected from the BAIF Research Foundation were utilized. The conception rate was modeled as a binary trait, using linear repeatability models. Random regression models (RRM) were used to obtain the trajectory of variance components across age of the bulls. Legendre Polynomials up to order of fit of 4 were used for the random effects of additive genetic and permanent environmental effects. 200,000 Gibbs samples were generated with a burn-in of 20,000 and thinning interval of 50 using the THRGIBBS1F90 program. Heritability estimates were very low (0.1) in …


Principal Response Curve Analysis Of Arthropod Community Abundance Data With Sparse Subsets, Changjian Jiang, C. R. Brown, P. Asiimwe, Chen Meng, Adam W. Schapaugh May 2022

Principal Response Curve Analysis Of Arthropod Community Abundance Data With Sparse Subsets, Changjian Jiang, C. R. Brown, P. Asiimwe, Chen Meng, Adam W. Schapaugh

Conference on Applied Statistics in Agriculture and Natural Resources

Principal response curve (PRC) analysis was applied to an assessment of the ecological impact of the genetically-modified (GM), insect-resistant, cotton MON 88702 on predatory Hemiptera communities in the field. The field community was represented by ten taxa collected ten times across the season at six sites, in which individual taxa were not observed in at least 25% of the time (unique site x collection combinations). These complete absences and those nearly so, called sparse subsets of the data in this investigation, were the result of geoclimatic and seasonal variations, which are both independent of the treatment effect for which the …


Handling Non-Detects With Imputation In A Nested Design: A Simulation Study, Rose Adjei, John R. Stevens May 2022

Handling Non-Detects With Imputation In A Nested Design: A Simulation Study, Rose Adjei, John R. Stevens

Conference on Applied Statistics in Agriculture and Natural Resources

In this paper, a simulation study was conducted to assess whether it is ideal to address the issue of non-detects in data using a traditional substitution approach for non-detects, imputation, or a non-imputation based approach. Simulated data used were simple nested designs motivated by a real-life data in a study of bumble bee activity in a commercial cherry orchard by Kuivila et al. (2021). The simulated data were generated at different thresholds or censoring levels and at different effect sizes. For each simulated data, seven popular existing techniques to handle non-detects were applied: (i) Zero substitution, (ii) Substitution with half …


Overview Of Optimal Experimental Design And A Survey Of Its Expanse In Application To Agricultural Studies, Stephen J. Walsh May 2022

Overview Of Optimal Experimental Design And A Survey Of Its Expanse In Application To Agricultural Studies, Stephen J. Walsh

Conference on Applied Statistics in Agriculture and Natural Resources

Optimal Design of Experiments is currently recognized as the modern dominant approach to planning experiments in industrial engineering and manufacturing applications. This approach to design has gained traction among practitioners in the last two decades on two-fronts: 1) optimal designs are the result of a complicated optimization calculation and recent advances in both computing efficiency and algorithms have enabled this approach in real time for practitioners, and 2) such designs are now popular because they allow the researcher to ‘design for the experiment’ by working constraints, cost, number of experiments, and the model of the intended post-hoc data analysis into …


Measuring Irregularity Via Approximate Entropy: How Does Perceived Human Instability Affect One's Own Stability?, Madi Braunersrither Dec 2021

Measuring Irregularity Via Approximate Entropy: How Does Perceived Human Instability Affect One's Own Stability?, Madi Braunersrither

Fall Student Research Symposium 2021

In a study performed at Utah State University, participants were prompted to evaluate the stability of pictured human postures while standing on a force plate. The force plate was used to collect the center of pressure of the subjects by recording measurements in the vertical and horizontal directions. The way these factors fluctuate over time and the irregularity in this fluctuation, specifically, can give insight into the subject’s postural stability. Rather than working with summary statistics such as means and variances of fitting parameters of a distribution as commonly done in statistics, we want to measure irregularity through analyzing the …