Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Theses/Dissertations

2018

Discipline
Institution
Publication

Articles 1 - 17 of 17

Full-Text Articles in Statistics and Probability

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett Dec 2018

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create …


Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert Dec 2018

Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Today we know that there are many genetically driven diseases and health conditions. These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic …


Statistical Design Of Experiment Techniques In Manufacturing, Caroline M. Kerfonta Oct 2018

Statistical Design Of Experiment Techniques In Manufacturing, Caroline M. Kerfonta

Senior Theses

There are many statistical techniques used to design experiments. These techniques are used in many different fields. This thesis will focus on the use of the three most common techniques used to design statistical experiments in manufacturing.

The three techniques that will be investigated are completely randomized design, randomized block design, and factorial design. These techniques will be compared, contrasted, and explained. Research examples will be presented along with sample R code for each technique. These examples will be accompanied by analysis of the techniques as well as an overview of the uses and history of experiments in manufacturing


Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen Aug 2018

Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Integrating real data into a classroom is one of the recommendations in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) college report which lays out guidelines for an introductory statistics course (Committee, GAISE College Report ASA Revision, 2016). In order to assess the effect of using real data in a classroom, the students received physical activity trackers to wear during an undergraduate introductory statistics course taught in the summer. This tracker, a Fitbit, enabled students to monitor and record their steps, calories, and active time throughout the class. Collecting personal activity data (PAD) creates a large database which …


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Hierarchical Bayesian Data Fusion Using Autoencoders, Yevgeniy Vladimirovich Reznichenko Jul 2018

Hierarchical Bayesian Data Fusion Using Autoencoders, Yevgeniy Vladimirovich Reznichenko

Master's Theses (2009 -)

In this thesis, a novel method for tracker fusion is proposed and evaluated for vision-based tracking. This work combines three distinct popular techniques into a recursive Bayesian estimation algorithm. First, semi supervised learning approaches are used to partition data and to train a deep neural network that is capable of capturing normal visual tracking operation and is able to detect anomalous data. We compare various methods by examining their respective receiver operating conditions (ROC) curves, which represent the trade off between specificity and sensitivity for various detection threshold levels. Next, we incorporate the trained neural networks into an existing data …


Association Tests For Genetic Effect And Its Interaction With Environmental Factors, Zhengyang Zhou Jul 2018

Association Tests For Genetic Effect And Its Interaction With Environmental Factors, Zhengyang Zhou

Statistical Science Theses and Dissertations

My research is in the area of statistical genetics, and it contains three projects: (1) Differentiating the Cochran-Armitage (CA) trend test and Pearson’s chi-square test: location and dispersion; (2) Decomposing Pearson’s chi-square test: a linear regression and its departure from linearity; (3) Testing nonlinear gene-environment (GxE) interaction through varying coefficient and linear mixed models.

(1) In genetic case-control association studies, a standard practice is to perform the CA trend test with 1 degree-of-freedom (df) under the assumption of an additive model. However, when the true genetic model is recessive or near recessive, it is outperformed by Pearson’s chi-square test with …


Pseudo Power Law Statistics In A Jammed, Amorphous Solid, Jacob Brian Hass Jun 2018

Pseudo Power Law Statistics In A Jammed, Amorphous Solid, Jacob Brian Hass

Physics

Simulations have shown that in many solid materials, rearrangements within the solid obey power-law statistics. A connection has been proposed between these statistics and the ability of a system to reach a limit cycle under cyclic driving. We study experimentally a 2D jammed solid that reaches such a limit cycle. Our solid consists of microscopic plastic beads adsorbed at an oil-water interface and cyclically sheared by a magnetically driven needle. We track each particles trajectory in the solid to identify rearrangements. By associating particles both spatially and temporally, we can measure the extent of each rearrangement. We study specifically the …


A 3d Characteristics Database Of Land Engraved Areas With Known Subclass, Entni Lin Jun 2018

A 3d Characteristics Database Of Land Engraved Areas With Known Subclass, Entni Lin

Student Theses

Subclass characteristics on bullets may mislead firearm examiners when they rely on traditional 2D images. In order to provide indelible examples for training and help avoid identification errors, 3D topography surface maps and statistical methods of pattern recognition are applied to toolmarks on bullets containing known subclass characteristics. This research was conducted by collecting 3D topography surface map data from land engraved areas of bullets fired through known barrels. This data was processed and used to train the statistical algorithms to predict their origin. The results from the algorithm are compared with the “right answers” (i.e. correct IDs) of the …


Discrete Ranked Set Sampling, Heng Cui May 2018

Discrete Ranked Set Sampling, Heng Cui

Statistical Science Theses and Dissertations

Ranked set sampling (RSS) is an efficient data collection framework compared to simple random sampling (SRS). It is widely used in various application areas such as agriculture, environment, sociology, and medicine, especially in situations where measurement is expensive but ranking is less costly. Most past research in RSS focused on situations where the underlying distribution is continuous. However, it is not unusual to have a discrete data generation mechanism. Estimating statistical functionals are challenging as ties may truly exist in discrete RSS. In this thesis, we started with estimating the cumulative distribution function (CDF) in discrete RSS. We proposed two …


Analysis Of 2016-17 Major League Soccer Season Data Using Poisson Regression With R, Ian D. Campbell May 2018

Analysis Of 2016-17 Major League Soccer Season Data Using Poisson Regression With R, Ian D. Campbell

Undergraduate Theses and Capstone Projects

To the outside observer, soccer is chaotic with no given pattern or scheme to follow, a random conglomeration of passes and shots that go on for 90 minutes. Yet, what if there was a pattern to the chaos, or a way to describe the events that occur in the game quantifiably. Sports statistics is a critical part of baseball and a variety of other of today’s sports, but we see very little statistics and data analysis done on soccer. Of this research, there has been looks into the effect of possession time on the outcome of a game, the difference …


Mindset, Attitudes, And Success In Statistics, Matthew Isaac May 2018

Mindset, Attitudes, And Success In Statistics, Matthew Isaac

Undergraduate Honors Capstone Projects

Students in many disciplines are required to take an introductory statistics course while pursuing a college education. Despite the utility of statistical methods in future research and career pursuits, many students have negative views of statistics. We are interested in how students' mindsets and attitudes towards statistics impact their performance in an undergraduate statistics course. We administered a survey to students in several undergraduate statistics courses at Utah State University. This survey included questions addressing mathematics experience, attitudes towards statistics, mindset, and course performance. We observed that the majority of students indicated the presence of a growth mindset and positive …


Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia Apr 2018

Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia

Statistical Science Theses and Dissertations

This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.

Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level …


Modeling Multimodal Failure Effects Of Complex Systems Using Polyweibull Distribution, Daniel A. Timme Mar 2018

Modeling Multimodal Failure Effects Of Complex Systems Using Polyweibull Distribution, Daniel A. Timme

Theses and Dissertations

The Department of Defense (DoD) enlists multiple complex systems across each of their departments. Between the aging systems going through an overhaul and emerging new systems, quality assurance to complete the mission and secure the nation‘s objectives is an absolute necessity. The U.S. Air Force‘s increased interest in Remotely Piloted Aircraft (RPA) and the Space Warfighting domain are current examples of complex systems that must maintain high reliability and sustainability in order to complete missions moving forward. DoD systems continue to grow in complexity with an increasing number of components and parts in more complex arrangements. Bathtub-shaped hazard functions arise …


Students’ Interpretations Of Categorical Data Using Dynamic Graphical Representations, Adam Eide Jan 2018

Students’ Interpretations Of Categorical Data Using Dynamic Graphical Representations, Adam Eide

Master's Theses and Doctoral Dissertations

Statistical association is an important concept in statistics. An exploratory study examined how students reason about statistical association utilizing graphical representations constructed with CODAP, a dynamic statistical graphing software. Task-based interviews were conducted with three 6th grade students prior to formal instruction. Students’ conceptions of a statistical relationship, proportional reasoning skill level, ability to interpret bivariate categorical graphs (particularly segmented bar graphs and two-way binned plots), and ability to identify association of two categorical variables were all investigated through interview tasks and responses to inquiry. Students were found to have developing proportional reasoning skills and struggled to correctly define and …


Decision Trees: Predicting Future Losses For Insurance Data, Amanda Lahrmann Jan 2018

Decision Trees: Predicting Future Losses For Insurance Data, Amanda Lahrmann

Williams Honors College, Honors Research Projects

Big data is a term that has come to the spotlight for companies within recent years. Data analysis and business intelligence have become prominent sectors of companies and agencies. But what is big data? How has it impacted large companies and agencies? Why must it be embraced?

The best way to approach utilizing a big data set is to establish a question to answer. For this data set, the question that must be answered is “What variables cause a loss to occur?” To answer this question, first, we must understand what is meant by a “loss”, and take a look …


Analyzing Sensor Based Human Activity Data Using Time Series Segmentation To Determine Sleep Duration, Yogesh Deepak Lad Jan 2018

Analyzing Sensor Based Human Activity Data Using Time Series Segmentation To Determine Sleep Duration, Yogesh Deepak Lad

Masters Theses

"Sleep is the most important thing to rest our brain and body. A lack of sleep has adverse effects on overall personal health and may lead to a variety of health disorders. According to Data from the Center for disease control and prevention in the United States of America, there is a formidable increase in the number of people suffering from sleep disorders like insomnia, sleep apnea, hypersomnia and many more. Sleep disorders can be avoided by assessing an individual's activity over a period of time to determine the sleep pattern and duration. The sleep pattern and duration can be …