Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 123

Full-Text Articles in Physical Sciences and Mathematics

Using Natural Language Processing To Quantify The Efficacy Of Language Simplification As A Communication Strategy, Brian Nalley Aug 2023

Using Natural Language Processing To Quantify The Efficacy Of Language Simplification As A Communication Strategy, Brian Nalley

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

People with communication disorders often experience difficulties being understood by unfamiliar listeners or in noisy environments. A common strategy for effectively communicating in these scenarios is to use simpler and more predictable language. Despite the prevalence of this strategy, there has been little to no research to date focused on the effectiveness of language simplification as a communication strategy. This study seeks to begin filling that gap by using natural language processing to determine whether speakers with early-stage Parkinson’s disease and age-matched neurotypical speakers are able to successfully simplify their language while still maintaining the original message.

Simplification was measured …


Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle Aug 2023

Statistical Graph Quality Analysis Of Utah State University Master Of Science Thesis Reports, Ragan Astle

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Graphical software packages have become increasingly popular in our modern world, but there are concerns within the statistical visualization field about the default settings provided by these packages, which can make it challenging to create good quality graphs that align with standard graph principles. In this thesis, we investigate whether the quality of graphs from Utah State University (USU) Plan A Master of Science (MS) thesis reports from the years 1930 to 2019 was affected by the rise of graphical software packages. We collected all data stored on the USU Digital Commons website since November 2021 to determine the specific …


Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock Aug 2023

Stressor: An R Package For Benchmarking Machine Learning Models, Samuel A. Haycock

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many discipline specific researchers need a way to quickly compare the accuracy of their predictive models to other alternatives. However, many of these researchers are not experienced with multiple programming languages. Python has recently been the leader in machine learning functionality, which includes the PyCaret library that allows users to develop high-performing machine learning models with only a few lines of code. The goal of the stressor package is to help users of the R programming language access the advantages of PyCaret without having to learn Python. This allows the user to leverage R’s powerful data analysis workflows, while simultaneously …


An Interval-Valued Random Forests, Paul Gaona Partida Aug 2023

An Interval-Valued Random Forests, Paul Gaona Partida

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

There is a growing demand for the development of new statistical models and the refinement of established methods to accommodate different data structures. This need arises from the recognition that traditional statistics often assume the value of each observation to be precise, which may not hold true in many real-world scenarios. Factors such as the collection process and technological advancements can introduce imprecision and uncertainty into the data.

For example, consider data collected over a long period of time, where newer measurement tools may offer greater accuracy and provide more information than previous methods. In such cases, it becomes crucial …


Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion May 2023

Investigating The Effect Of Greediness On The Coordinate Exchange Algorithm For Generating Optimal Experimental Designs, William Thomas Gullion

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Design of Experiments (DoE) is the field of statistics concerned with helping researchers maximize the amount of information they gain from their experiments. Recently, researchers have been turning to optimal experimental designs instead of classical/catalog experimental designs. One of the most popular algorithms used today to generate optimal designs is the Coordinate Exchange (CEXCH) Algorithm. CEXCH is known to be a greedy algorithm, which means it tends to favor immediate, locally best designs instead of globally optimal designs. Previous research demonstrated that this tradeoff was efficacious in that it reduced the cost of a single run of CEXCH and allowed …


Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas May 2023

Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this …


Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum May 2023

Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A recent example of political violence in the United States was that of the January 6, 2021, Capitol attack in connection with the certification of Joseph R. Biden’s victory over Donald J. Trump in the 2020 US presidential election. This thesis analyzes the events of January 6, 2021, through the lens of social media discourse. This thesis presents a workflow that acquired over 5 million 8kun and Reddit posts from various apolitical and political forums in the three months preceding and following the Capitol attack on January 6, 2021. Techniques from text analysis are then used to group forums according …


Power Approximations For Generalized Linear Mixed Models In R Using Steep Priors On Variance Components, Sydney Geisler Dec 2022

Power Approximations For Generalized Linear Mixed Models In R Using Steep Priors On Variance Components, Sydney Geisler

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

When designing an experiment, researchers often want to know how likely they are to detect statistically significant effects in the resulting data, i.e., they want to estimate their statistical power. The probability distribution method is a flexible way to do this, and it is currently implemented in the statistical software package SAS. This method requires a hypothetical data set (showing the magnitude of hypothesized effects) and constant values of variance components, which are critical elements of the statistical models used. The statistical software package R is increasingly popular, but the probability distribution method has not yet been implemented in R, …


Statistical Challenges And Methods For Missing And Imbalanced Data, Rose Adjei Dec 2022

Statistical Challenges And Methods For Missing And Imbalanced Data, Rose Adjei

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Missing data remains a prevalent issue in every area of research. The impact of missing data, if not carefully handled, can be detrimental to any statistical analysis. Some statistical challenges associated with missing data include, loss of information, reduced statistical power and non-generalizability of findings in a study. It is therefore crucial that researchers pay close and particular attention when dealing with missing data. This multi-paper dissertation provides insight into missing data across different fields of study and addresses some of the above mentioned challenges of missing data through simulation studies and application to real datasets. The first paper of …


An Introduction To Combinatorics Via Cayley's Theorem, Jaylee Willis Aug 2022

An Introduction To Combinatorics Via Cayley's Theorem, Jaylee Willis

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In this paper, we explore some of the methods that are often used to solve combinatorial problems by proving Cayley’s theorem on trees in multiple ways. The intended audience of this paper is undergraduate and graduate mathematics students with little to no experience in combinatorics. This paper could also be used as a supplementary text for an undergraduate combinatorics course.


Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen Aug 2022

Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A major focus in statistics is building and improving computational algorithms that can use data to predict a response. Two fundamental camps of research arise from such a goal. The first camp is researching ways to get more accurate predictions. Many sophisticated methods, collectively known as machine learning methods, have been developed for this very purpose. One such method that is widely used across industry and many other areas of investigation is called Random Forests.

The second camp of research is that of improving the interpretability of machine learning methods. This is worthy of attention when analysts desire to optimize …


Geometry- And Accuracy-Preserving Random Forest Proximities With Applications, Jake S. Rhodes Aug 2022

Geometry- And Accuracy-Preserving Random Forest Proximities With Applications, Jake S. Rhodes

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many machine learning algorithms use calculated distances or similarities between data observations to make predictions, cluster similar data, visualize patterns, or generally explore the data. Most distances or similarity measures do not incorporate known data labels and are thus considered unsupervised. Supervised methods for measuring distance exist which incorporate data labels and thereby exaggerate separation between data points of different classes. This approach tends to distort the natural structure of the data. Instead of following similar approaches, we leverage a popular algorithm used for making data-driven predictions, known as random forests, to naturally incorporate data labels into similarity measures known …


Redefining Nba Basketball Positions Through Visualization And Mega-Cluster Analysis, Alexander L. Hedquist Aug 2022

Redefining Nba Basketball Positions Through Visualization And Mega-Cluster Analysis, Alexander L. Hedquist

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Basketball players have historically been classified based on one of five positions, namely Point Guards, Shooting Guards, Small Forwards, and Centers. While grouping players into these five categories may provide general descriptions of their perceived role, these standard positions fall short of describing players based on their true abilities and performance. This MS thesis proposes a method to group players of the National Basketball Association (NBA) from the past 20 seasons into more meaningful and specific player positions. We systematically group these players into nine distinct categories, and we draw from a vast array of visualization tools, techniques, and software …


Dynamic System Discovery With Recursive Physics-Informed Neural Networks, Jarrod Mau Aug 2022

Dynamic System Discovery With Recursive Physics-Informed Neural Networks, Jarrod Mau

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This thesis presents a novel method, recursive Physics informed neural network, to learn the right hand side of differential equations. The neural network takes in data, then trains, and then acts as a proxy for the differential equation which can be used for modeling. We show the theoretical superiority of the recursive approach. We also use computer simulations to demonstrate the proved properties.


Defining Areas Of Interest Using Voronoi And Modified Voronoi Tesselations To Analyze Eye-Tracking Data, Joanna D. Coltrin Aug 2022

Defining Areas Of Interest Using Voronoi And Modified Voronoi Tesselations To Analyze Eye-Tracking Data, Joanna D. Coltrin

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Eye tracking is a technology used to track where someone is looking. Eye-tracking technology is often used to study what people focus on when looking at a photo of another person. The eye-tracking technology records points on a photo that a person is looking at. When the photo being looked at shows a person, the points can be categorized by body part such as head, right hand, left hand, and torso. This thesis presents the use of partially circular areas to define the body parts of the person in the photo and therefore categorize the points collected by the eye-tracker. …


Gps-Denied Navigation Using Synthetic Aperture Radar Images And Neural Networks, Teresa White Dec 2021

Gps-Denied Navigation Using Synthetic Aperture Radar Images And Neural Networks, Teresa White

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Unmanned aerial vehicles (UAV) often rely on GPS for navigation. GPS signals, however, are very low in power and easily jammed or otherwise disrupted. This paper presents a method for determining the navigation errors present at the beginning of a GPS-denied period utilizing data from a synthetic aperture radar (SAR) system. This is accomplished by comparing an online-generated SAR image with a reference image obtained a priori. The distortions relative to the reference image are learned and exploited with a convolutional neural network to recover the initial navigational errors, which can be used to recover the true flight trajectory throughout …


Housing Variables And Immigration: An Exploratory And Predictive Data Analysis In New York City, Jhonatan Medri Cobos Aug 2021

Housing Variables And Immigration: An Exploratory And Predictive Data Analysis In New York City, Jhonatan Medri Cobos

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The relationship between housing and immigration has become relevant in the U.S., especially in a highly populated metropolis such as New York City (NYC). Determining whether immigration status affects home ownership percentage, household rent, or housing cost percentage could help understand the quality of life of NYC residents. Graphical exploration, spatial dependence tests, and spatial autoregressive models of housing and immigration variables provide some insights about their relationships. Our exploration takes place at some geographic subareas of NYC.

Our results first indicate that the housing and immigration data reports spatial dependence; values of a geographic subarea are related to values …


A Phenological Model For A Southern Population Of Mountain Pine Beetle, Catherine E. Wangen Aug 2021

A Phenological Model For A Southern Population Of Mountain Pine Beetle, Catherine E. Wangen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The mountain pine beetle (MPB, Dendroctonus ponderosae Hopkins) attacks living Pinus trees across a widespread area of western North America, causing significant ecological and economic damage. The ability to make accurate predictions of how MPB populations across this range will respond to temperatures, which affect MPB progress through life stages, is essential. Northern and southern populations of MPB are genetically different in response to temperature, requiring geographic-specific model parameters. There is not currently a predictive model for the southern MPB life cycle, despite concerns that those populations may be more susceptible to increased numbers of generations per year, which would …


The Effect Of High Elevation Weather Stations On The Usda's Pasture, Rangeland, And Forage Insurance Program, Wyatt Matthew Feuz May 2021

The Effect Of High Elevation Weather Stations On The Usda's Pasture, Rangeland, And Forage Insurance Program, Wyatt Matthew Feuz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This paper examines the effect of high elevation weather stations on the rainfall index used by the Pasture, Rangeland, and Forage insurance program. Weather station data for the state of Utah is used to identify high elevation weather stations and their location. Utilizing the corresponding rainfall index data, the effect of the high elevation weather stations is determined. This paper finds when high elevation weather stations begin reporting there is a jump up of 19.01–27.88 percentage points on average in the rainfall index for the corresponding grid locations. This indicates the rainfall index may not accurately represent actual precipitation amounts …


Delta Hedging Of Financial Options Using Reinforcement Learning And An Impossibility Hypothesis, Ronak Tali Dec 2020

Delta Hedging Of Financial Options Using Reinforcement Learning And An Impossibility Hypothesis, Ronak Tali

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In this thesis we take a fresh perspective on delta hedging of financial options as undertaken by market makers. The current industry standard of delta hedging relies on the famous Black Scholes formulation that prescribes continuous time hedging in a way that allows the market maker to remain risk neutral at all times. But the Black Scholes formulation is a deterministic model that comes with several strict assumptions such as zero transaction costs, log normal distribution of the underlying stock prices, etc. In this paper we employ Reinforcement Learning to redesign the delta hedging problem in way that allows us …


A Bayesian Markov Chain Monte Carlo Approach To Uncertainty Quantification, Matthew Isaac Aug 2020

A Bayesian Markov Chain Monte Carlo Approach To Uncertainty Quantification, Matthew Isaac

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Uncertainty quantification (UQ) is a framework used frequently in engineering analyses to understand how uncertainty in system inputs lead to uncertainty in the system output. An instability is observed in a UQ method proposed by Roy and Oberkampf and a Bayesian Markov Chain Monte Carlo approach to UQ is offered as an alternative. The Bayesian approach allows analysts to incorporate information from various available sources including observed measurements and expert opinion and to update the analysis and results as more information becomes available. An illustrative engineering example is provided as a platform to demonstrate the Bayesian UQ approach and to …


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and economics …


'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst May 2020

'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In order to effectively create map-based visualizations, some map modifications need to be conducted to ensure the map is readable and interpretable. There are several issues that need to be addressed to achieve this. The boundaries of a country may be overly complex which is particularly true with coastal areas of countries. Regions may be small and not seen in the final plot, as is the case with many capital cities in the world’s countries such as Washington D.C. and the Federal District of Mexico City. In other countries, regions may geographically lie far away from the rest of the …


Using A Discrete Choice Experiment To Estimate Willingness To Pay For Location Based Housing Attributes, Kristopher C. Toll Dec 2019

Using A Discrete Choice Experiment To Estimate Willingness To Pay For Location Based Housing Attributes, Kristopher C. Toll

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In 1993, a travel study was conducted along the Wasatch front in Utah (Research Systems Group INC, 2013). The main purpose of this study was to assess travel behavior to understand the needs for future growth in Utah. Since then, the Research Service Group (RSG), conducted a new study in 2012 to understand current travel preferences in Utah. This survey, called the Residential Choice Stated Preference survey, asked respondents to make ten choice comparisons between two hypothetical homes. Each home in the choice comparison was described by different attributes, those attributes that were used are, type of neighborhood, distance from …


Tuning Hyperparameters In Supervised Learning Models And Applications Of Statistical Learning In Genome-Wide Association Studies With Emphasis On Heritability, Jill F. Lundell Aug 2019

Tuning Hyperparameters In Supervised Learning Models And Applications Of Statistical Learning In Genome-Wide Association Studies With Emphasis On Heritability, Jill F. Lundell

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an …


Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert Dec 2018

Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Today we know that there are many genetically driven diseases and health conditions. These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic …


Statistical Methods To Account For Gene-Level Covariates In Normalization Of High-Dimensional Read-Count Data, Lauren Holt Lenz Dec 2018

Statistical Methods To Account For Gene-Level Covariates In Normalization Of High-Dimensional Read-Count Data, Lauren Holt Lenz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The goal of genetic-based cancer research is often to identify which genes behave differently in cancerous and healthy tissue. This difference in behavior, referred to as differential expression, may lead researchers to more targeted preventative care and treatment. One way to measure the expression of genes is though a process called RNA-Seq, that takes physical tissue samples and maps gene products and fragments in the sample back to the gene that created it, resulting in a large read-count matrix with genes in the rows and a column for each sample. The read-counts for tumor and normal samples are then compared …


The Power Law Distribution Of Agricultural Land Size, Lauren Chamberlain Dec 2018

The Power Law Distribution Of Agricultural Land Size, Lauren Chamberlain

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This paper demonstrates that the distribution of county level agricultural land size in the United States is best described by a power-law distribution, a distribution that displays extremely heavy tails. This indicates that the majority of farmland exists in the upper tail. Our analysis indicates that the top 5% of agricultural counties account for about 25% of agricultural land between 1997-2012. The power-law distribution of farm size has important implications for the design of more efficient regional and national agricultural policies as counties close to the mean account for little of the cumulative distribution of total agricultural land. This has …


Surviving A Civil War: Expanding The Scope Of Survival Analysis In Political Science, Andrew B. Whetten Dec 2018

Surviving A Civil War: Expanding The Scope Of Survival Analysis In Political Science, Andrew B. Whetten

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Survival Analysis in the context of Political Science is frequently used to study the duration of agreements, political party influence, wars, senator term lengths, etc. This paper surveys a collection of methods implemented on a modified version of the Power-Sharing Event Dataset (which documents civil war peace agreement durations in the Post-Cold War era) in order to identify the research questions that are optimally addressed by each method. A primary comparison will be made between a Cox Proportional Hazards Model using some advanced capabilities in the glmnet package, a Survival Random Forest Model, and a Survival SVM. En route to …


Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen Aug 2018

Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Integrating real data into a classroom is one of the recommendations in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) college report which lays out guidelines for an introductory statistics course (Committee, GAISE College Report ASA Revision, 2016). In order to assess the effect of using real data in a classroom, the students received physical activity trackers to wear during an undergraduate introductory statistics course taught in the summer. This tracker, a Fitbit, enabled students to monitor and record their steps, calories, and active time throughout the class. Collecting personal activity data (PAD) creates a large database which …