Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 110

Full-Text Articles in Physical Sciences and Mathematics

Quantum Computing Simulation Of The Hydrogen Molecule System With Rigorous Quantum Circuit Derivations, Yili Zhang Aug 2022

Quantum Computing Simulation Of The Hydrogen Molecule System With Rigorous Quantum Circuit Derivations, Yili Zhang

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Quantum computing has been an emerging technology in the past few decades. It utilizes the power of programmable quantum devices to perform computation, which can solve complex problems in a feasible time that is impossible with classical computers. Simulating quantum chemical systems using quantum computers is one of the most active research fields in quantum computing. However, due to the novelty of the technology and concept, most materials in the literature are not accessible for newbies in the field and sometimes can cause ambiguity for practitioners due to missing details.

This report provides a rigorous derivation of simulating quantum chemistry …


A Bayesian Hierarchical Approach For Modeling Virtual Species With Realistic Functional Trait Relationships, Sarah Bogen Aug 2022

A Bayesian Hierarchical Approach For Modeling Virtual Species With Realistic Functional Trait Relationships, Sarah Bogen

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Understanding the spatial and temporal dynamics of plant populations has important implications for the fields of ecology and conservation. A rich body of mathematical modeling approaches, including reaction-diffusion equations and integrodifference equations, have been developed to mechanistically model population spread based on species demography and seed dispersal characteristics. However, with over 390,000 plant species on Earth, it is not feasible to collect complete information on all species for the purpose of drawing generalized conclusions. One means of overcoming such a problem is through trait-based modeling, which seeks to represent realistic combinations of organismal traits rather than focusing on individual species. …


Retail Trading And Stock Volatility: The Case Of Robinhood, Cooper Jones May 2021

Retail Trading And Stock Volatility: The Case Of Robinhood, Cooper Jones

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

We examine the relation between Robinhood usership and stock market volatility. We show that daily fluctuations in Robinhood usership, which is used to proxy retail trading, significantly influence various measures of volatility. These results might suggest that Robinhood users contribute to noise trading as they are generally individuals trading on name recognition, media coverage, popularity, and familiarity of products, rather than on fundamental values. In our empirical approach, we find that the percentage increase in Robinhood usership Granger causes increases in daily stock volatility.


Survival Analysis: An Exact Method For Rare Events, Kristina Reutzel Dec 2020

Survival Analysis: An Exact Method For Rare Events, Kristina Reutzel

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Conventional asymptotic methods for survival analysis work well when sample sizes are at least moderately sufficient. When dealing with small sample sizes or rare events, the results from these methods have the potential to be inaccurate or misleading. To handle such data, an exact method is proposed and compared against two other methods: 1) the Cox proportional hazards model and 2) stratified logistic regression for discrete survival analysis data.


Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark May 2019

Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Filtered historical simulation with an underlying GARCH process can be used as a valuable tool in VaR analysis, as it derives risk estimates that are sensitive to the distributional properties of the historical data of the produced predictive density. I examine the applications to risk analysis that filtered historical simulation can provide, as well as an interpretation of the predictive density as a poor man’s Bayesian posterior distribution. The predictive density allows us to make associated probabilistic statements regarding the results for VaR analysis, giving greater measurement of risk and the ability to maintain the optimal level of risk per …


Feasibility Of Multi-Year Forecast For The Colorado River Water Supply: Time Series Modeling, Brian Plucinski May 2019

Feasibility Of Multi-Year Forecast For The Colorado River Water Supply: Time Series Modeling, Brian Plucinski

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The Colorado River is one of the largest resources for water in the United States, as well as being an important asset to the economy. Previous studies have shown a connection between the Great Salt Lake and the Colorado River. This study used time series analysis to build models to predict the water supply of the Colorado River ten years out. These models used data from the Colorado River in addition to Great Salt Lake water elevation. Several models suggest a decline in water supply from 2013 – 2020, before starting to increase. These predictions differ from predictions published by …


Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett Dec 2018

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create …


A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua Aug 2018

A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.


Examining Quadratic Relationships Between Traits And Methods In Two Multitrait-Multimethod Models, Fredric A. Hintz May 2018

Examining Quadratic Relationships Between Traits And Methods In Two Multitrait-Multimethod Models, Fredric A. Hintz

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Psychological researchers are interested in the validity of the measures they use, and the multitrait-multimethod design is one of the most frequently employed methods to examine validity. Confirmatory factor analysis is now a commonly used analytic tool for examining multitrait-multimethod data, where an underlying mathematical model is fit to data and the amount of variance due to the trait and method factors is estimated. While most contemporary confirmatory factor analysis methods for examining multi-trait multi-method data do not allow relationships between the trait and method factors, a few recently proposed models allow for the examination of linear relationships between traits …


Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell Dec 2017

Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Movement and habitat selection by Greater Sage-grouse (Centrocercus uropasianus) is of great interest to wildlife managers tasked with applying conservation measures for this iconic western species. Current technology has created small and lightweight GPS (Global Positioning Systems) transmitters that can be attached to sage-grouse. Using GIS software and statistical programs such as Program R, land managers can analyze GPS location data to assess how sage-grouse are geospatially interacting with their habitats. Within the Panguitch Sage-Grouse Management Area (SGMA) thousands of acres of land have been restored or manipulated to enhance sage-grouse habitat; this usually involves removal of pinyon pine …


Imputation For Random Forests, Joshua Young Aug 2017

Imputation For Random Forests, Joshua Young

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.

One of the new methods …


Tree-Based Regression For Interval-Valued Data, Chih-Ching Yeh Aug 2017

Tree-Based Regression For Interval-Valued Data, Chih-Ching Yeh

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Regression methods for interval-valued data have been increasingly studied in recent years. As most of the existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and therefore development of nonlinear regression tools for intervalvalued data is crucial. In this project, we propose a tree-based regression method for interval-valued data, which is well applicable to both linear and nonlinear problems. Unlike linear regression models that usually require additional constraints to ensure positivity of the predicted interval length, the proposed method estimates the regression function in a nonparametric way, so the …


Prediction Of Stress Increase In Unbonded Tendons Using Sparse Principal Component Analysis, Eric Mckinney Aug 2017

Prediction Of Stress Increase In Unbonded Tendons Using Sparse Principal Component Analysis, Eric Mckinney

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

While internal and external unbonded tendons are widely utilized in concrete structures, the analytic solution for the increase in unbonded tendon stress, Δ���, is challenging due to the lack of bond between strand and concrete. Moreover, most analysis methods do not provide high correlation due to the limited available test data. In this thesis, Principal Component Analysis (PCA), and Sparse Principal Component Analysis (SPCA) are employed on different sets of candidate variables, amongst the material and sectional properties from the database compiled by Maguire et al. [18]. Predictions of Δ��� are made via Principal Component Regression models, and the method …


Statistical Methods For Assessing Individual Oocyte Viability Through Gene Expression Profiles, Michael O. Bishop May 2017

Statistical Methods For Assessing Individual Oocyte Viability Through Gene Expression Profiles, Michael O. Bishop

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Abstract

Statistical Methods for Assessing Individual Oocyte Viability Through Gene Expression Profiles

By

Michael O. Bishop

Utah State University, 2017

Major Professor: Dr. John R. Stevens

Department: Mathematics and Statistics

Oocytes are the precursor cells to the female gamete, or egg. While reproduction may vary from species to species, within humans and most domesticated animals, the oocyte maturation process is fairly similar. As an oocyte matures, there are various processes that take place, all of which have an effect on the viability of the individual oocyte. Barring outside damage that may come to the oocyte, one of the primary reasons …


Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers May 2017

Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Survival analysis methods are a mainstay of the biomedical fields but are finding increasing use in other disciplines including finance and engineering. A widely used tool in survival analysis is the Cox proportional hazards regression model. For this model, all the predicted survivor curves have the same basic shape, which may not be a good approximation to reality. In contrast the Random Survival Forests does not make the proportional hazards assumption and has the flexibility to model survivor curves that are of quite different shapes for different groups of subjects. We applied both techniques to a number of publicly available …


A Comparison Of Statistical Methods Relating Pairwise Distance To A Binary Subject-Level Covariate, Rachael Stone May 2017

A Comparison Of Statistical Methods Relating Pairwise Distance To A Binary Subject-Level Covariate, Rachael Stone

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

A community ecologist provided a motivating data set involving a certain animal species with two behavior groups, along with a pairwise genetic distance matrix among individuals. Many community ecologists have analyzed similar data sets with a method known as the Hopkins method, testing for an association between the subject-level covariate (behavior group) and the pairwise distance. This community ecologist wanted to know if they used the Hopkins method, would their results be meaningful? Their question inspired this thesis work, where a different data set was used for confidentiality reasons. Multiple methods (Hopkins method, ADONIS, ANOSIM, and Distance Regression) were used …


Tutorial For Using The Center For High Performance Computing At The University Of Utah And An Example Using Random Forest, Stephen Barton Dec 2016

Tutorial For Using The Center For High Performance Computing At The University Of Utah And An Example Using Random Forest, Stephen Barton

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random Forests are very memory intensive machine learning algorithms and most computers would fail at building models from datasets with millions of observations. Using the Center for High Performance Computing (CHPC) at the University of Utah and an airline on-time arrival dataset with 7 million observations from the U.S. Department of Transportation Bureau of Transportation Statistics we built 316 models by adjusting the depth of the trees and randomness of each forest and compared the accuracy and time each took. Using this dataset we discovered that substantial restrictions to the size of trees, observations allowed for each tree, and variables …


The Eyes Have It: Eye Tracking Data Visualizations Of Viewing Patterns Of Statistical Graphics, Trent Fawcett May 2016

The Eyes Have It: Eye Tracking Data Visualizations Of Viewing Patterns Of Statistical Graphics, Trent Fawcett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

As statistical graphics continue to expand to manage an ever growing amount of diverse data, a need to evaluate the effectiveness of graphics, both basic and complex, has arisen. Technological advancements have given a means to evaluate the effectiveness of graphs and graphical components through eye tracking systems. Eye tracking systems are likewise in need of software that will enable easy evaluation and exploration of data. The focus of this Master's Report is to evaluate the dual solution. An exploration of an eye tracker setup is made, with extensive consideration of testing statistical graphics providing a basis for continued research …


Comparing Linear Mixed Models To Meta-Regression Analysis In The Greenville Air Quality Study, Lynsie M. Daley May 2015

Comparing Linear Mixed Models To Meta-Regression Analysis In The Greenville Air Quality Study, Lynsie M. Daley

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The effect of air quality on public health is an important issue in need of better understanding. There are many stakeholders, especially in Utah and Cache Valley, where the poor air quality as measured by PM 2.5 levels and consequent inversions can sometimes be the very worst in the nation. This project focuses on comparing two statistical methods used to analyze an important air quality data set from the Greenville Air Quality Study, focusing on a lung function response variable. A linear mixed model, with a random factor for subject, gives slope estimates and their significance for predictor variables of …


Survival Analysis For Truncated Data And Competing Risks, Michael Steelman May 2015

Survival Analysis For Truncated Data And Competing Risks, Michael Steelman

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The purpose of this project is to consider the problems of left truncation and competing risks in analyzing censored survival data, and to compare and contrast various approaches for handling these problems. The motivation for this work comes from an analysis of data from the Cache County Memory Study. Study investigators were interested in the association between early-life psychologically stressful events (e.g., parental or sibling death, or parental divorce, among others) and late-life risk of Alzheimer's disease (AD). While conventional methods for censored survival data can be applied, the presence of left truncation and competing risks (i.e., other adverse events …


Visualizing And Forecasting Box-Office Revenues: A Case Study Of The James Bond Movie Series, Vahan Petrosyan Aug 2014

Visualizing And Forecasting Box-Office Revenues: A Case Study Of The James Bond Movie Series, Vahan Petrosyan

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This Master's report deals with the visualization and forecasting of the box-office revenues and some related variables from the James Bond movie Series. Visualization techniques such as time series plots, scatterplot matrices, dotplots, boxplots, histograms, normal quantile plots, parallel coordinates plots, heatmaps, mosaic plots, association plots, and choropleth maps are used to provide some deeper insights into the given dataset. Additionally, the results from an article published in 1997 are reproduced and extended. This article modeled the box-office revenues of the James Bond movie series. Numerous statistical models were examined to obtain the models that are closest to the original …


Mvgst: Tools For Multivariate And Directional Gene Set Testing, Dennis S. Mecham May 2014

Mvgst: Tools For Multivariate And Directional Gene Set Testing, Dennis S. Mecham

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

There are many platforms available for simultaneously measuring the relative activity, or expression, levels of all genes in an organism. Genes that have systematically dier- ent expression levels between experimental factor levels are called \dierentially expressed". Because genes are annotated based on their known roles in biological processes (BP), molec- ular functions (MF) and cellular components (CC), gene expression levels can be used to determine relative activity levels of individual BP, MF, or CC between experimental fac- tor levels (this is called gene set testing). Often multiple experimental dierences are of interest simultaneously, which necessitates multivariate gene set testing. Only …


Modeling Asset Volatility Using Various Resources, Isaac G. Blackhurst May 2014

Modeling Asset Volatility Using Various Resources, Isaac G. Blackhurst

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Volatility is of central interest in modern financial econometrics. This thesis evaluates three different methods of measuring volatility.


Lifetime Modeling Of Deficient Bridges In New York, Levi Phippen May 2014

Lifetime Modeling Of Deficient Bridges In New York, Levi Phippen

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Given the importance of bridges to a state's economy and strength, and the costs involved in building and maintaining bridges, maximizing their service life is essential. In order to safely extend a bridge's utility as long as possible, an understanding of its lifetime processes is needed. This paper attempts to model the lifetime of a bridge in New York once it has become deficient. Lifetime is defined to be the length of time between deficiency classification and failure. A bridge is considered deficient when certain structural components receive a poor rating in the National Bridge Inventory, which is compiled annually …


Probability Estimation In Random Forests, Chunyang Li May 2013

Probability Estimation In Random Forests, Chunyang Li

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random Forests is a useful ensemble approach that provides accurate predictions for classification, regression and many different machine learning problems. Classification has been a very useful and popular application for Random Forests. However, it is preferable to have the probability of a membership rather than the simple knowledge that one belongs to whichever group. Votes and the regression method are current probability estimation methods that have been developed in Random Forests. In this thesis, we introduce two new methods, proximity weighting and the out-of-bag method, trying to improve the current methods. Several different simulations are designed to evaluate the new …


Assessing Changes In The Abundance Of The Continental Population Of Scaup Using A Hierarchical Spatio-Temporal Model, Beth E. Ross May 2012

Assessing Changes In The Abundance Of The Continental Population Of Scaup Using A Hierarchical Spatio-Temporal Model, Beth E. Ross

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

In ecological studies, the goal is often to describe and gain further insight into ecological processes underlying the data collected during observational studies. Because of the nature of observational data, it can often be difficult to separate the variation in the data from the underlying process or 'state dynamics.' In order to better address this issue, it is becoming increasingly common for researchers to use hierarchical models. Hierarchical spatial, temporal, and spatio-temporal models allow for the simultaneous modeling of both first and second order processes, thus accounting for underlying autocorrelation in the system while still providing insight into overall spatial …


Interactive Random Forests Plots, Anna T. Quach May 2012

Interactive Random Forests Plots, Anna T. Quach

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random Forests is a useful data mining tool that is quite popular in finding variable importance. However, many people don’t make use of the Random Forests results in interactive graphs. Partly, this is because software packages that can do interactive graphs can’t handle large data sets and those that use Random Forests have large data sets or many variables. A new software package in R, known as iPlots eXtreme, that is still in development makes it simple to explore large data sets interactively. I have created a function, called irfplot (interactive random forests plot) that specifically uses Random Forests to …


Simulating Loan Repayment By The Sinking Fund Method (Sinking Fund Governed By A Sequence Of Interest Rates), Placede Judicaelle Gangnang Fosso May 2012

Simulating Loan Repayment By The Sinking Fund Method (Sinking Fund Governed By A Sequence Of Interest Rates), Placede Judicaelle Gangnang Fosso

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The sinking fund method is a way to repay a loan where the borrower pays the amount of interest accrued by the principal at the end of each time period and puts a certain amount in a sinking fund in order to repay the principal at the end of the loan. Usually, we assume that the interest rate on the sinking fund is the same during the entire time of the loan. In the study, we will depart from the usual assumptions and will look at different scenarios, including when changes of the interest rate on the sinking fund follows …


Ignoring The Spatial Context In Intro Statistics Classes - And Some Simple Graphical Remedies, Nathan Voge May 2012

Ignoring The Spatial Context In Intro Statistics Classes - And Some Simple Graphical Remedies, Nathan Voge

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Statistical data often have a spatial (geographic) context, be it countries of the world, states in the US, counties within a state, cities across the globe, or locations where measurements have been taken. However, most introductory statistics books do not even suggest that such data often are not independent from location, but rather are eected by some spatial association. Remedies are simple: Display data via various map views and brie y discuss which additional information can be extracted from such a graphical representation. In this report, we will visit a variety of popular introductory statistics textbooks and show how some …


Dietary Patterns And Cognitive Decline In Aged Populations, Austin J. Bowles Aug 2011

Dietary Patterns And Cognitive Decline In Aged Populations, Austin J. Bowles

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

In this paper, we discuss distinctive features of longitudinal studies, and illustrate two regression-based methods for the analysis of longitudinal data. A study of dietary patterns and cognitive decline (Cache County Memory Study) is used to motivate our discussion and analysis. Cognitive decline is a risk factor for Alzheimer’s disease, the sixth leading cause of all deaths among Americans. The study attempted to identify dietary patterns associated with reduced risk of age-related cognitive decline in elderly populations. Higher levels of adherence to the Dietary Approaches to Stop Hypertension (DASH) and/or Mediterranean diets were found to be associated with increased cognitive …