Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

Utah State University

Keyword
Publication Year
Publication
Publication Type

Articles 31 - 60 of 277

Full-Text Articles in Physical Sciences and Mathematics

A Robust Clustering Method Using Compositional Data Restrictions: Studying Wood Properties In The Reforestation Of Portugal, Pamela M. Chiroque-Solano, Guido A. Moreira May 2022

A Robust Clustering Method Using Compositional Data Restrictions: Studying Wood Properties In The Reforestation Of Portugal, Pamela M. Chiroque-Solano, Guido A. Moreira

Conference on Applied Statistics in Agriculture and Natural Resources

Classification of multivariate observations while preserving the data’s natural restriction is a challenge. Special properties such as identifiability, interpretability, and others need to be cared for to build a new approach. To avoid these complications, many transformation algorithms have been developed to use traditional models.In this context, the aim of this work is to propose a robust probabilistic distance algorithm to classify compositional data. Based on the probabilistic distance (PD) clustering approach, the proposal identifies clusters minimizing a joint distance function, JDF, which is part of a dissimilarity measure. This measure combines the PD clustering approach with the density of …


Random Regression For Modeling Semen Fertility In Hf Purebred And Crossbred Bulls Using A Bayesian Framework, Vrinda Ambike, R. Venkataramanan, S. M. K. Karthickeyan, K. G. Tirumurugaan, Kaustubh Bhave, M. Swaminathan May 2022

Random Regression For Modeling Semen Fertility In Hf Purebred And Crossbred Bulls Using A Bayesian Framework, Vrinda Ambike, R. Venkataramanan, S. M. K. Karthickeyan, K. G. Tirumurugaan, Kaustubh Bhave, M. Swaminathan

Conference on Applied Statistics in Agriculture and Natural Resources

Data on insemination records of Holstein Friesian (HF) purebred (n=45,497) and crossbred (n=58,497) collected from the BAIF Research Foundation were utilized. The conception rate was modeled as a binary trait, using linear repeatability models. Random regression models (RRM) were used to obtain the trajectory of variance components across age of the bulls. Legendre Polynomials up to order of fit of 4 were used for the random effects of additive genetic and permanent environmental effects. 200,000 Gibbs samples were generated with a burn-in of 20,000 and thinning interval of 50 using the THRGIBBS1F90 program. Heritability estimates were very low (0.1) in …


Principal Response Curve Analysis Of Arthropod Community Abundance Data With Sparse Subsets, Changjian Jiang, C. R. Brown, P. Asiimwe, Chen Meng, Adam W. Schapaugh May 2022

Principal Response Curve Analysis Of Arthropod Community Abundance Data With Sparse Subsets, Changjian Jiang, C. R. Brown, P. Asiimwe, Chen Meng, Adam W. Schapaugh

Conference on Applied Statistics in Agriculture and Natural Resources

Principal response curve (PRC) analysis was applied to an assessment of the ecological impact of the genetically-modified (GM), insect-resistant, cotton MON 88702 on predatory Hemiptera communities in the field. The field community was represented by ten taxa collected ten times across the season at six sites, in which individual taxa were not observed in at least 25% of the time (unique site x collection combinations). These complete absences and those nearly so, called sparse subsets of the data in this investigation, were the result of geoclimatic and seasonal variations, which are both independent of the treatment effect for which the …


Handling Non-Detects With Imputation In A Nested Design: A Simulation Study, Rose Adjei, John R. Stevens May 2022

Handling Non-Detects With Imputation In A Nested Design: A Simulation Study, Rose Adjei, John R. Stevens

Conference on Applied Statistics in Agriculture and Natural Resources

In this paper, a simulation study was conducted to assess whether it is ideal to address the issue of non-detects in data using a traditional substitution approach for non-detects, imputation, or a non-imputation based approach. Simulated data used were simple nested designs motivated by a real-life data in a study of bumble bee activity in a commercial cherry orchard by Kuivila et al. (2021). The simulated data were generated at different thresholds or censoring levels and at different effect sizes. For each simulated data, seven popular existing techniques to handle non-detects were applied: (i) Zero substitution, (ii) Substitution with half …


Overview Of Optimal Experimental Design And A Survey Of Its Expanse In Application To Agricultural Studies, Stephen J. Walsh May 2022

Overview Of Optimal Experimental Design And A Survey Of Its Expanse In Application To Agricultural Studies, Stephen J. Walsh

Conference on Applied Statistics in Agriculture and Natural Resources

Optimal Design of Experiments is currently recognized as the modern dominant approach to planning experiments in industrial engineering and manufacturing applications. This approach to design has gained traction among practitioners in the last two decades on two-fronts: 1) optimal designs are the result of a complicated optimization calculation and recent advances in both computing efficiency and algorithms have enabled this approach in real time for practitioners, and 2) such designs are now popular because they allow the researcher to ‘design for the experiment’ by working constraints, cost, number of experiments, and the model of the intended post-hoc data analysis into …


Measuring Irregularity Via Approximate Entropy: How Does Perceived Human Instability Affect One's Own Stability?, Madi Braunersrither Dec 2021

Measuring Irregularity Via Approximate Entropy: How Does Perceived Human Instability Affect One's Own Stability?, Madi Braunersrither

Fall Student Research Symposium 2021

In a study performed at Utah State University, participants were prompted to evaluate the stability of pictured human postures while standing on a force plate. The force plate was used to collect the center of pressure of the subjects by recording measurements in the vertical and horizontal directions. The way these factors fluctuate over time and the irregularity in this fluctuation, specifically, can give insight into the subject’s postural stability. Rather than working with summary statistics such as means and variances of fitting parameters of a distribution as commonly done in statistics, we want to measure irregularity through analyzing the …


Creating Transparent And Accessible Methods For Approximating The Composite Strength Of Concrete Sandwich Wall Panels, Ruth Taylor Dec 2021

Creating Transparent And Accessible Methods For Approximating The Composite Strength Of Concrete Sandwich Wall Panels, Ruth Taylor

Fall Student Research Symposium 2021

Background: The method of designing partially composite sandwich wall panels (SWPs) relies strongly on the use of percent of composite action. Calculating these values proves to be a complex and virtually inaccessible process for practicing engineers, resulting in the reliance on proprietary software or connector-system manufacturers for the necessary values. We simulated percent composite action data, including several relevant variables, to examine the relationship and determine if simple and accessible methods of calculation could be created. Methods: Code from collaborating engineers used to calculate percent composite action with the Iterative Sandwich Beam Theory (ISBT) method was translated into R, a …


Gps-Denied Navigation Using Synthetic Aperture Radar Images And Neural Networks, Teresa White Dec 2021

Gps-Denied Navigation Using Synthetic Aperture Radar Images And Neural Networks, Teresa White

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Unmanned aerial vehicles (UAV) often rely on GPS for navigation. GPS signals, however, are very low in power and easily jammed or otherwise disrupted. This paper presents a method for determining the navigation errors present at the beginning of a GPS-denied period utilizing data from a synthetic aperture radar (SAR) system. This is accomplished by comparing an online-generated SAR image with a reference image obtained a priori. The distortions relative to the reference image are learned and exploited with a convolutional neural network to recover the initial navigational errors, which can be used to recover the true flight trajectory throughout …


A Phenological Model For A Southern Population Of Mountain Pine Beetle, Catherine E. Wangen Aug 2021

A Phenological Model For A Southern Population Of Mountain Pine Beetle, Catherine E. Wangen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The mountain pine beetle (MPB, Dendroctonus ponderosae Hopkins) attacks living Pinus trees across a widespread area of western North America, causing significant ecological and economic damage. The ability to make accurate predictions of how MPB populations across this range will respond to temperatures, which affect MPB progress through life stages, is essential. Northern and southern populations of MPB are genetically different in response to temperature, requiring geographic-specific model parameters. There is not currently a predictive model for the southern MPB life cycle, despite concerns that those populations may be more susceptible to increased numbers of generations per year, which would …


Housing Variables And Immigration: An Exploratory And Predictive Data Analysis In New York City, Jhonatan Medri Cobos Aug 2021

Housing Variables And Immigration: An Exploratory And Predictive Data Analysis In New York City, Jhonatan Medri Cobos

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The relationship between housing and immigration has become relevant in the U.S., especially in a highly populated metropolis such as New York City (NYC). Determining whether immigration status affects home ownership percentage, household rent, or housing cost percentage could help understand the quality of life of NYC residents. Graphical exploration, spatial dependence tests, and spatial autoregressive models of housing and immigration variables provide some insights about their relationships. Our exploration takes place at some geographic subareas of NYC.

Our results first indicate that the housing and immigration data reports spatial dependence; values of a geographic subarea are related to values …


Retail Trading And Stock Volatility: The Case Of Robinhood, Cooper Jones May 2021

Retail Trading And Stock Volatility: The Case Of Robinhood, Cooper Jones

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

We examine the relation between Robinhood usership and stock market volatility. We show that daily fluctuations in Robinhood usership, which is used to proxy retail trading, significantly influence various measures of volatility. These results might suggest that Robinhood users contribute to noise trading as they are generally individuals trading on name recognition, media coverage, popularity, and familiarity of products, rather than on fundamental values. In our empirical approach, we find that the percentage increase in Robinhood usership Granger causes increases in daily stock volatility.


The Effect Of High Elevation Weather Stations On The Usda's Pasture, Rangeland, And Forage Insurance Program, Wyatt Matthew Feuz May 2021

The Effect Of High Elevation Weather Stations On The Usda's Pasture, Rangeland, And Forage Insurance Program, Wyatt Matthew Feuz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This paper examines the effect of high elevation weather stations on the rainfall index used by the Pasture, Rangeland, and Forage insurance program. Weather station data for the state of Utah is used to identify high elevation weather stations and their location. Utilizing the corresponding rainfall index data, the effect of the high elevation weather stations is determined. This paper finds when high elevation weather stations begin reporting there is a jump up of 19.01–27.88 percentage points on average in the rainfall index for the corresponding grid locations. This indicates the rainfall index may not accurately represent actual precipitation amounts …


Survival Analysis: An Exact Method For Rare Events, Kristina Reutzel Dec 2020

Survival Analysis: An Exact Method For Rare Events, Kristina Reutzel

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Conventional asymptotic methods for survival analysis work well when sample sizes are at least moderately sufficient. When dealing with small sample sizes or rare events, the results from these methods have the potential to be inaccurate or misleading. To handle such data, an exact method is proposed and compared against two other methods: 1) the Cox proportional hazards model and 2) stratified logistic regression for discrete survival analysis data.


Delta Hedging Of Financial Options Using Reinforcement Learning And An Impossibility Hypothesis, Ronak Tali Dec 2020

Delta Hedging Of Financial Options Using Reinforcement Learning And An Impossibility Hypothesis, Ronak Tali

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In this thesis we take a fresh perspective on delta hedging of financial options as undertaken by market makers. The current industry standard of delta hedging relies on the famous Black Scholes formulation that prescribes continuous time hedging in a way that allows the market maker to remain risk neutral at all times. But the Black Scholes formulation is a deterministic model that comes with several strict assumptions such as zero transaction costs, log normal distribution of the underlying stock prices, etc. In this paper we employ Reinforcement Learning to redesign the delta hedging problem in way that allows us …


A Bayesian Markov Chain Monte Carlo Approach To Uncertainty Quantification, Matthew Isaac Aug 2020

A Bayesian Markov Chain Monte Carlo Approach To Uncertainty Quantification, Matthew Isaac

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Uncertainty quantification (UQ) is a framework used frequently in engineering analyses to understand how uncertainty in system inputs lead to uncertainty in the system output. An instability is observed in a UQ method proposed by Roy and Oberkampf and a Bayesian Markov Chain Monte Carlo approach to UQ is offered as an alternative. The Bayesian approach allows analysts to incorporate information from various available sources including observed measurements and expert opinion and to update the analysis and results as more information becomes available. An illustrative engineering example is provided as a platform to demonstrate the Bayesian UQ approach and to …


'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst May 2020

'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In order to effectively create map-based visualizations, some map modifications need to be conducted to ensure the map is readable and interpretable. There are several issues that need to be addressed to achieve this. The boundaries of a country may be overly complex which is particularly true with coastal areas of countries. Regions may be small and not seen in the final plot, as is the case with many capital cities in the world’s countries such as Washington D.C. and the Federal District of Mexico City. In other countries, regions may geographically lie far away from the rest of the …


Simplicity As A New Environmental Virtue, Justin Wheeler May 2020

Simplicity As A New Environmental Virtue, Justin Wheeler

Undergraduate Honors Capstone Projects

This paper argues for the addition of a new environmentally focused virtue, simplicity, to the virtue ethical framework developed by Aristotle. First, relevant background from Aristotle’s virtue ethics are developed including the crucial, “doctrine of the mean”, a balance between excess and deficiency of a specified character trait. The tenets of the new virtue simplicity are developed with practical examples based on Aristotle’s method of developing a virtue of character. Simplicity is proposed as a desire to take the appropriate amount from the natural world and an acceptance of one’s circumstances. Those possessing simplicity will not fall victim to the …


The Two Types Of Society: Computationally Revealing Recurrent Social Formations And Their Evolutionary Trajectories, Lux Miranda May 2020

The Two Types Of Society: Computationally Revealing Recurrent Social Formations And Their Evolutionary Trajectories, Lux Miranda

Undergraduate Honors Capstone Projects

Comparative social science has a long history of attempts to classify societies and cultures in terms of shared characteristics. However, only recently has it become feasible to conduct quantitative analysis of large historical datasets to mathematically approach the study of social complexity and classify shared societal characteristics. Such methods have the potential to identify recurrent social formations in human societies and contribute to social evolutionary theory. However, in order to achieve this potential, repeated studies are needed to assess the robustness of results to changing methods and data sets. Using an improved derivative of the Seshat: Global History Databank, we …


Demystification Of Graph And Information Entropy, Bryce Frederickson May 2020

Demystification Of Graph And Information Entropy, Bryce Frederickson

Undergraduate Honors Capstone Projects

Shannon entropy is an information-theoretic measure of unpredictability in probabilistic models. Recently, it has been used to form a tool, called the von Neumann entropy, to study quantum mechanics and network flows by appealing to algebraic properties of graph matrices. But still, little is known about what the von Neumann entropy says about the combinatorial structure of the graphs themselves. This paper gives a new formulation of the von Neumann entropy that describes it as a rate at which random movement settles down in a graph. At the same time, this new perspective gives rise to a generalization of von …


Analysis Of Sat And Isat Scores For Madison School District In Rexburg, Idaho, Holly Dawn Palmer May 2020

Analysis Of Sat And Isat Scores For Madison School District In Rexburg, Idaho, Holly Dawn Palmer

Undergraduate Honors Capstone Projects

Testing is an integral part of measuring education. If used properly SAT scores can be compared across the nation, and statewide tests can compare different school districts to each other if done properly to avoid certain pitfalls (Fetler, 1991). However, if tests do not have a significant impact on a student, their motivation to take the test will be low and test quality cannot be assumed. When the state funds two separate tests for their students but only one has a significant impact on the student, how should the scores for each test be used, and is it okay to …


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and economics …


Using A Discrete Choice Experiment To Estimate Willingness To Pay For Location Based Housing Attributes, Kristopher C. Toll Dec 2019

Using A Discrete Choice Experiment To Estimate Willingness To Pay For Location Based Housing Attributes, Kristopher C. Toll

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In 1993, a travel study was conducted along the Wasatch front in Utah (Research Systems Group INC, 2013). The main purpose of this study was to assess travel behavior to understand the needs for future growth in Utah. Since then, the Research Service Group (RSG), conducted a new study in 2012 to understand current travel preferences in Utah. This survey, called the Residential Choice Stated Preference survey, asked respondents to make ten choice comparisons between two hypothetical homes. Each home in the choice comparison was described by different attributes, those attributes that were used are, type of neighborhood, distance from …


Student Insights Report, Fall 2019, The Center For Student Analytics Sep 2019

Student Insights Report, Fall 2019, The Center For Student Analytics

Publications

For the past three years, the staff of the Center for Student Analytics have worked to discover and expose meaningful, data-informed insights into what helps students succeed at Utah State University. The following pages highlight 20 of the most useful insights we found provided here in small sets that will be useful to students, faculty, staff, university leadership, parents, and even prospective students. As you explore this report, we encourage you to see the student data as a window into USU itself. While big data helps us understand how individual students are performing, it tells us a great deal more …


Tuning Hyperparameters In Supervised Learning Models And Applications Of Statistical Learning In Genome-Wide Association Studies With Emphasis On Heritability, Jill F. Lundell Aug 2019

Tuning Hyperparameters In Supervised Learning Models And Applications Of Statistical Learning In Genome-Wide Association Studies With Emphasis On Heritability, Jill F. Lundell

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an …


Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark May 2019

Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Filtered historical simulation with an underlying GARCH process can be used as a valuable tool in VaR analysis, as it derives risk estimates that are sensitive to the distributional properties of the historical data of the produced predictive density. I examine the applications to risk analysis that filtered historical simulation can provide, as well as an interpretation of the predictive density as a poor man’s Bayesian posterior distribution. The predictive density allows us to make associated probabilistic statements regarding the results for VaR analysis, giving greater measurement of risk and the ability to maintain the optimal level of risk per …


Feasibility Of Multi-Year Forecast For The Colorado River Water Supply: Time Series Modeling, Brian Plucinski May 2019

Feasibility Of Multi-Year Forecast For The Colorado River Water Supply: Time Series Modeling, Brian Plucinski

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The Colorado River is one of the largest resources for water in the United States, as well as being an important asset to the economy. Previous studies have shown a connection between the Great Salt Lake and the Colorado River. This study used time series analysis to build models to predict the water supply of the Colorado River ten years out. These models used data from the Colorado River in addition to Great Salt Lake water elevation. Several models suggest a decline in water supply from 2013 – 2020, before starting to increase. These predictions differ from predictions published by …


Drones And “Ghost Guns”: Unregulated Legal Space, Tori Bodine Mar 2019

Drones And “Ghost Guns”: Unregulated Legal Space, Tori Bodine

Research on Capitol Hill

Law enforcement agencies are fighting a two - pronged battle when it comes to emerging technologies: keeping up with new ways criminals are using technology and developing effective ways to combat these innovations, while balancing these challenges against preserving the individual liberties of law - abiding citizens. This conflict is especially apparent with regard to criminal use of commercial drones and the developing fringe market surrounding homemade untraceable firearms (“ghost guns”).


Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery Dec 2018

Power In Pairs: Assessing The Statistical Value Of Paired Samples In Tests For Differential Expression, John R. Stevens, Jennifer S. Herrick, Roger K. Wolff, Martha L. Slattery

Mathematics and Statistics Faculty Publications

Background: When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study.

Results: We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of …


Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett Dec 2018

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create …


Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert Dec 2018

Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Today we know that there are many genetically driven diseases and health conditions. These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic …