Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Theses/Dissertations

Applied Statistics

Institution
Publication Year
Publication

Articles 1 - 30 of 58

Full-Text Articles in Physical Sciences and Mathematics

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley Apr 2023

Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley

Senior Theses

In Major League Baseball (MLB), the outcome of a stolen base attempt has important implications. Success moves the runner closer to scoring, while failure records an out and removes the runner from the basepaths altogether. Therefore, it is important that the decision by a coach or player to steal a base is well-informed. In this thesis, I explore a statistical approach to making this decision. I train logistic regression and random forest models, using data about the game situation and about the runner, pitcher, and catcher involved in the stolen base attempt, to estimate the probability that a stolen base …


Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun Aug 2022

Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun

Statistical Science Theses and Dissertations

Alternating recurrent events data arise commonly in health research; examples include hospital admissions and discharges of diabetes patients; exacerbations and remissions of chronic bronchitis; and quitting and restarting smoking. Recent work has involved formulating and estimating joint models for the recurrent event times considering non-negligible event durations. However, prediction models for transition between recurrent events are lacking. We consider the development and evaluation of methods for predicting future events within these models. Specifically, we propose a tool for dynamically predicting transition between alternating recurrent events in real time. Under a flexible joint frailty model, we derive the predictive probability of …


Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti May 2022

Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti

Honors Thesis

Machine learning is often used to build predictive models by extracting patterns from large data sets. Such techniques are increasingly being utilized to predict outcomes in the social sciences. One such application is predicting student success. Machine learning can be applied to predicting student acceptance and success in academia. Using these tools for education-related data analysis, may enable the evaluation of programs, resources and curriculum. Currently, research is needed to examine application, admissions, and retention data in order to address equity in college computer science programs. However, most student-level data sets contain sensitive data that cannot be made public. To …


Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams May 2022

Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams

Political Science Undergraduate Honors Theses

In an effort to increase the accuracy of statewide political polling in Arkansas, we explore the statistical strategy of weighting with a focus on one yearly opinion poll: The Arkansas Poll. We conduct over 70 weighting experiments on the 2016 and 2020 Arkansas Polls using a variety of variables and opinion questions. From these experiments, we find that while some weighted variables tend to create larger changes, weighting typically results in a single-digit percentage change that does not substantially shift or “flip” the majorities. Due to a greater rate of change through weighting in the 2020 Poll compared to the …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su Jan 2022

Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su

Theses and Dissertations--Statistics

When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki Jun 2021

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds of articles, infographics, …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu Dec 2020

Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

Statistical Science Theses and Dissertations

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy between …


Bayesian Topological Machine Learning, Christopher A. Oballe Aug 2020

Bayesian Topological Machine Learning, Christopher A. Oballe

Doctoral Dissertations

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …


Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin Aug 2019

Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin

Graduate Theses and Dissertations

Tree ring chronology data is known to reflect regional climate due to the strong impact of rainfall and temperature. Therefore, tree ring data can be used to reconstruct historical climate in order to understand how climate changed in the past and make prediction about the future behavior of the climate. For simplicity, this research only considers the influence of precipitation on tree ring growth within the New England area. A total of 94 measurement sites are used to record tree ring width over 881 years and corresponding precipitation data are given at some locations for 121 years. We developed a …


Samples, Unite! Understanding The Effects Of Matching Errors On Estimation Of Total When Combining Data Sources, Benjamin Williams May 2019

Samples, Unite! Understanding The Effects Of Matching Errors On Estimation Of Total When Combining Data Sources, Benjamin Williams

Statistical Science Theses and Dissertations

Much recent research has focused on methods for combining a probability sample with a non-probability sample to improve estimation by making use of information from both sources. If units exist in both samples, it becomes necessary to link the information from the two samples for these units. Record linkage is a technique to link records from two lists that refer to the same unit but lack a unique identifier across both lists. Record linkage assigns a probability to each potential pair of records from the lists so that principled matching decisions can be made. Because record linkage is a probabilistic …


Market Research On Student Concert Attendance At Bgsu's College Of Musical Arts, Mary Solomon May 2019

Market Research On Student Concert Attendance At Bgsu's College Of Musical Arts, Mary Solomon

Honors Projects

Bowling Green State University boasts a well established College of Musical Arts which holds concerts performed by esteemed faculty, prestigious guest artists, and students. The school hosts these events in Kobacker Hall and Bryan Recital Hall which can accommodate up to 800 and 250 audience members, respectively. However, performances in Kobacker hall only fill one- fourth of the 800 seats, on average. Why is this so? This project aims to investigate the factors that influence students’ decisions to attend concerts at the College of Musical Arts (CMA). By methodology of survey research and statistical analysis, this project will look into …


Advanced Statistics In Arkansas Sports Reporting, Andrew Lee Epperson May 2019

Advanced Statistics In Arkansas Sports Reporting, Andrew Lee Epperson

Graduate Theses and Dissertations

This study seeks to analyze how Arkansas’ sports journalists are adapting to the recent surge in available advanced statistics that are being used by certain national news organizations. Using in-depth qualitative research that includes in-depth interviews with a number of individuals in the print, broadcast, and athletics side of sports coverage, we discover how journalists and coaches use these next-generation analytics, what they fundamentally mean for the evolution of each respective path, and why so few Arkansas reporters and writers use them at the time of this paper’s defense. We see how budgets and deadlines restrict the use of these …


Daily And Seasonal Variability Of Offshore Wind Power On The Central California Coast And Statewide Demand, Matthew Douglas Kehrli Apr 2019

Daily And Seasonal Variability Of Offshore Wind Power On The Central California Coast And Statewide Demand, Matthew Douglas Kehrli

Physics

No abstract provided.


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore Jan 2019

Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore

CMC Senior Theses

The continued study of asymptomatic Ebolavirus infection is necessary to develop a more complete understanding of Ebola transmission dynamics. This paper conducts a meta-analysis of eight studies that measure seroprevalence (the number of subjects that test positive for anti-Ebolavirus antibodies in their blood) in subjects with household exposure or known case-contact with Ebola, but that have shown no symptoms. In our two random effects Bayesian hierarchical models, we find estimated seroprevalences of 8.76% and 9.72%, significantly higher than the 3.3% found by a previous meta-analysis of these eight studies. We also produce a variation of this meta-analysis where we exclude …


Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert Dec 2018

Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Today we know that there are many genetically driven diseases and health conditions. These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic …


Statistical Design Of Experiment Techniques In Manufacturing, Caroline M. Kerfonta Oct 2018

Statistical Design Of Experiment Techniques In Manufacturing, Caroline M. Kerfonta

Senior Theses

There are many statistical techniques used to design experiments. These techniques are used in many different fields. This thesis will focus on the use of the three most common techniques used to design statistical experiments in manufacturing.

The three techniques that will be investigated are completely randomized design, randomized block design, and factorial design. These techniques will be compared, contrasted, and explained. Research examples will be presented along with sample R code for each technique. These examples will be accompanied by analysis of the techniques as well as an overview of the uses and history of experiments in manufacturing


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Pseudo Power Law Statistics In A Jammed, Amorphous Solid, Jacob Brian Hass Jun 2018

Pseudo Power Law Statistics In A Jammed, Amorphous Solid, Jacob Brian Hass

Physics

Simulations have shown that in many solid materials, rearrangements within the solid obey power-law statistics. A connection has been proposed between these statistics and the ability of a system to reach a limit cycle under cyclic driving. We study experimentally a 2D jammed solid that reaches such a limit cycle. Our solid consists of microscopic plastic beads adsorbed at an oil-water interface and cyclically sheared by a magnetically driven needle. We track each particles trajectory in the solid to identify rearrangements. By associating particles both spatially and temporally, we can measure the extent of each rearrangement. We study specifically the …


Analysis Of 2016-17 Major League Soccer Season Data Using Poisson Regression With R, Ian D. Campbell May 2018

Analysis Of 2016-17 Major League Soccer Season Data Using Poisson Regression With R, Ian D. Campbell

Undergraduate Theses and Capstone Projects

To the outside observer, soccer is chaotic with no given pattern or scheme to follow, a random conglomeration of passes and shots that go on for 90 minutes. Yet, what if there was a pattern to the chaos, or a way to describe the events that occur in the game quantifiably. Sports statistics is a critical part of baseball and a variety of other of today’s sports, but we see very little statistics and data analysis done on soccer. Of this research, there has been looks into the effect of possession time on the outcome of a game, the difference …


Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia Apr 2018

Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia

Statistical Science Theses and Dissertations

This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.

Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level …


Decision Trees: Predicting Future Losses For Insurance Data, Amanda Lahrmann Jan 2018

Decision Trees: Predicting Future Losses For Insurance Data, Amanda Lahrmann

Williams Honors College, Honors Research Projects

Big data is a term that has come to the spotlight for companies within recent years. Data analysis and business intelligence have become prominent sectors of companies and agencies. But what is big data? How has it impacted large companies and agencies? Why must it be embraced?

The best way to approach utilizing a big data set is to establish a question to answer. For this data set, the question that must be answered is “What variables cause a loss to occur?” To answer this question, first, we must understand what is meant by a “loss”, and take a look …


Analyzing Baseball Data With R, Claudia Sison Jun 2017

Analyzing Baseball Data With R, Claudia Sison

Statistics

No abstract provided.


Statistical Methods For Assessing Individual Oocyte Viability Through Gene Expression Profiles, Michael O. Bishop May 2017

Statistical Methods For Assessing Individual Oocyte Viability Through Gene Expression Profiles, Michael O. Bishop

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Abstract

Statistical Methods for Assessing Individual Oocyte Viability Through Gene Expression Profiles

By

Michael O. Bishop

Utah State University, 2017

Major Professor: Dr. John R. Stevens

Department: Mathematics and Statistics

Oocytes are the precursor cells to the female gamete, or egg. While reproduction may vary from species to species, within humans and most domesticated animals, the oocyte maturation process is fairly similar. As an oocyte matures, there are various processes that take place, all of which have an effect on the viability of the individual oocyte. Barring outside damage that may come to the oocyte, one of the primary reasons …