Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 555

Full-Text Articles in Physical Sciences and Mathematics

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong Dec 2022

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen Oct 2022

Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen

Theses

In this paper we explore Bayesian inference and its application to the problem of estimating the intensity function of a non-homogeneous Poisson process. These processes model the behavior of phenomena in which one or more events, known as arrivals, occur independently of one another over a certain period of time. We are concerned with the number of events occurring during particular time intervals across several realizations of the process. We show that given sufficient data, we are able to construct a piecewise-constant function which accurately estimates the mean rates on particular intervals. Further, we show that as we reduce these …


Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li Sep 2022

Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li

Research Collection School Of Computing and Information Systems

Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory’s [25] hierarchical generative process and non-semantic factors by …


The Microscopical Evidence Traces Analysis Of Household Dust And Its Statistical Significance As A Definitive Identification Technique, Stephanie Polifroni Sep 2022

The Microscopical Evidence Traces Analysis Of Household Dust And Its Statistical Significance As A Definitive Identification Technique, Stephanie Polifroni

Dissertations, Theses, and Capstone Projects

Evidence found at crime scenes is used to assist in creating a link the suspect, the victim, and the scene. As stated by the Locard’s Principle, every contact leaves a trace, that evidence can be used to link together an investigation. Traces are collected in hopes that they can be identified and associated to an individual or individuals to help solve that particular crime. However, the strongest conclusion for evidence traces is an association to a source, and unless a physical match of some kind is found, an individualization cannot be established even when known sample is available. However, having …


Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun Aug 2022

Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun

Statistical Science Theses and Dissertations

Alternating recurrent events data arise commonly in health research; examples include hospital admissions and discharges of diabetes patients; exacerbations and remissions of chronic bronchitis; and quitting and restarting smoking. Recent work has involved formulating and estimating joint models for the recurrent event times considering non-negligible event durations. However, prediction models for transition between recurrent events are lacking. We consider the development and evaluation of methods for predicting future events within these models. Specifically, we propose a tool for dynamically predicting transition between alternating recurrent events in real time. Under a flexible joint frailty model, we derive the predictive probability of …


Efficient Approaches To Steady State Detection In Multivariate Systems, Honglun Xu Aug 2022

Efficient Approaches To Steady State Detection In Multivariate Systems, Honglun Xu

Open Access Theses & Dissertations

Steady state detection is critically important in many engineering fields such as fault detection and diagnosis, process monitoring and control. However, most of the existing methods are designed for univariate signals. In this dissertation, we proposed an efficient online steady state detection method for multivariate systems through a sequential Bayesian partitioning approach. The signal is modeled by a Bayesian piecewise constant mean and covariance model, and a recursive updating method is developed to calculate the posterior distributions analytically. The duration of the current segment is utilized to test the steady state. Insightful guidance is provided for hyperparameter selection. The effectiveness …


Characterizing Wildfire In The Frank Church Wilderness, Idaho, Between 1972-2012, Abigail Christine Axness Aug 2022

Characterizing Wildfire In The Frank Church Wilderness, Idaho, Between 1972-2012, Abigail Christine Axness

Boise State University Theses and Dissertations

I examined wildfire characteristics in the Frank Church Wilderness, central Idaho, between 1972-2012. Studying fire characteristics in the Frank Church Wilderness provides an opportunity to understand the history of wildfires in a federally designated wilderness area, largely devoid of management impacts with limited human access and activity. The ~958,000-hectare Frank Church Wilderness area encompasses the Middle Fork Salmon River. Vegetation cover ranges from high elevation (~2500-3200 meters) mixed conifer forests in the headwaters to low-elevation (~600-1000 meters) sagebrush-steppe and ponderosa pine (Pinus Ponderosa) forests. The Frank Church Wilderness is defined as unmanaged because effective fire suppression (e.g., vehicle …


Spurious Correlation Sestina, Jules Nyquist Jul 2022

Spurious Correlation Sestina, Jules Nyquist

Journal of Humanistic Mathematics

This is a sestina poem about Spurious Correlations with a magical realism angle for beginning students learning statistics for the first time during the COVID pandemic.


The Efficacy Of The Covid-19 Vaccine In Mississippi, Ilyse Miriam Levy May 2022

The Efficacy Of The Covid-19 Vaccine In Mississippi, Ilyse Miriam Levy

Honors Theses

The Efficacy of The COVID-19 Vaccine in Mississippi

(Under the direction of Dr. Xin Dang)

By tracking and analyzing fifty-three weeks of COVID-19 data, this thesis analyzes the efficacy of the COVID-19 vaccine within the State of Mississippi. Over the course of these fifty-three weeks, I have also been able to calculate the confidence intervals for vaccination efficacy and the risk reduction due to vaccination by using data regarding the correlations between deaths and vaccination status, provided to me by the Mississippi Office of Epidemiology. My analysis demonstrates that the COVID-19 vaccine is effective not only in Mississippi but also …


Forecasting Razorback Baseball Game Outcomes, Austin Raabe May 2022

Forecasting Razorback Baseball Game Outcomes, Austin Raabe

Information Systems Undergraduate Honors Theses

Despite the disappointing end to the 2021 Arkansas Razorback baseball year, the team’s success provided hog fans something to look forward to next season. While they will be without the 2021 Golden Spikes Award winner, Kevin Kopps, and four All-SEC team selections, the 2022 roster has promising new and returning talent. With fifty percent of the players who played significant time last year coming back (minimum ten hits or ten innings pitched), the arrival of several impact transfers from major conferences, and a recruiting class ranked in the top five according to Perfect Game, there is reason to believe that …


Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams May 2022

Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams

Political Science Undergraduate Honors Theses

In an effort to increase the accuracy of statewide political polling in Arkansas, we explore the statistical strategy of weighting with a focus on one yearly opinion poll: The Arkansas Poll. We conduct over 70 weighting experiments on the 2016 and 2020 Arkansas Polls using a variety of variables and opinion questions. From these experiments, we find that while some weighted variables tend to create larger changes, weighting typically results in a single-digit percentage change that does not substantially shift or “flip” the majorities. Due to a greater rate of change through weighting in the 2020 Poll compared to the …


An Examination Of The Statistics And Risk Management Concepts Behind The Patient Protection And Affordable Care Act (Ppaca) Of 2010, Scott Sinclair May 2022

An Examination Of The Statistics And Risk Management Concepts Behind The Patient Protection And Affordable Care Act (Ppaca) Of 2010, Scott Sinclair

Undergraduate Honors Thesis Collection

The Patient Protection and Affordable Care Act (PPACA) is the overarching federal law that has impacted the intricacies of the health insurance market for more than a decade. Using the supervised learning method of multiple linear regression, the relationship between the medical loss ratio rebates and predictor variables such as the state, health insurance market, and the number of insurance companies owing rebates will be analyzed, along with the actuarial value of metal tiers and geographic rating area factors in terms of their relationship to the insurance premium for a standard family of four, defined as a forty-year-old couple with …


On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis Apr 2022

On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis

Honors Scholar Theses

The Kolmogorov–Smirnov (KS) test is one of the most popular goodness-of-fit tests for comparing a sample with a hypothesized parametric distribution. Nevertheless, it has often been misused. The standard one-sample KS test applies to independent, continuous data with a hypothesized distribution that is completely specified. It is not uncommon, however, to see in the literature that it was applied to dependent, discrete, or rounded data, with hypothesized distributions containing estimated parameters. For example, it has been "discovered" multiple times that the test is too conservative when the parameters are estimated. We demonstrate misuses of the one-sample KS test in three …


Analytical Study To Determine Significant Causes Of Increased No-Hitters In The 2021 Major League Baseball Season, Joel Robison Apr 2022

Analytical Study To Determine Significant Causes Of Increased No-Hitters In The 2021 Major League Baseball Season, Joel Robison

Honors Projects

Why were there so many no-hitters in the 2021 MLB season? This project focuses on possible significant causes to the record-breaking number of no-hitters pitched in the 2021 Major League Baseball season. Specifically, this project takes an analytical look at the recent trends in launch angles and spin rates to determine if there are any significant causes to the increased number of no-hitters in baseball. The random nature and unpredictability of the game of baseball make it almost impossible to come to any solid conclusions.


Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito Apr 2022

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito

Michigan Tech Publications

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this …


A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose Apr 2022

A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose

Masters Theses & Specialist Projects

Department of Psychological Sciences Western Kentucky University There are two options to estimate a range of likely values for the population mean of a continuous variable: one for when the population standard deviation is known and another for when the population standard deviation is unknown. There are seven proposed equations to calculate the confidence interval for the population mean of a dichotomous variable: normal approximation interval, Wilson interval, Jeffreys interval, Clopper-Pearson, Agresti-Coull, arcsine transformation, and logit transformation. In this study, I compared the percent effectiveness of each equation using a Monte Carlo analysis and the interval range over a range …


Mixture Models In Machine Learning, Soumyabrata Pal Mar 2022

Mixture Models In Machine Learning, Soumyabrata Pal

Doctoral Dissertations

Modeling with mixtures is a powerful method in the statistical toolkit that can be used for representing the presence of sub-populations within an overall population. In many applications ranging from financial models to genetics, a mixture model is used to fit the data. The primary difficulty in learning mixture models is that the observed data set does not identify the sub-population to which an individual observation belongs. Despite being studied for more than a century, the theoretical guarantees of mixture models remain unknown for several important settings.

In this thesis, we look at three groups of problems. The first part …


Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses, University of Nebraska-Lincoln

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


So Long My Friend, Bryan Mcnair Jan 2022

So Long My Friend, Bryan Mcnair

Journal of Humanistic Mathematics

No abstract provided.


A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot Jan 2022

A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot

Graduate Theses, Dissertations, and Problem Reports

Preclinical behavioral neuroscience often uses choice paradigms to capture psychiatric symptoms. In particular, the subfield of operant research produces nested datasets with many discrete choices in a session. The standard analytic practice is to aggregate choice into a continuous variable and analyze using ANOVA or linear regression. However, choice data often have multiple interdependent outcomes of interest, violating an assumption of general linear models. The aim of the current study was to quantify the accuracy of linear mixed-effects regression (LMER) for analyzing data from a 4-choice operant task called the Rodent Gambling Task (RGT), which measures decision-making in the context …


Using Deep Neural Networks To Analyze Precision Agriculture Data, Stephanie Liebl Jan 2022

Using Deep Neural Networks To Analyze Precision Agriculture Data, Stephanie Liebl

Electronic Theses and Dissertations

As the population of the Earth increases, there is a growing need for food to feed the inhabitants. Precision agriculture offers techniques and tools that can be used to help accommodate the growing population. One specific precision agriculture tool is remote sensing data, which can be used to image fields as an effort to better predict or understand the crops. In this thesis, deep neural networks are used to evaluate various spatial, spectral, and temporal resolutions of three different satellite images to determine which best predicts corn yield. The main metrics we used to evaluate the models were R-squared (R2), …


Statistical Methods For The Analysis And Development Of Quantitative Imaging Biomarkers, Carolyn Lou Jan 2022

Statistical Methods For The Analysis And Development Of Quantitative Imaging Biomarkers, Carolyn Lou

Publicly Accessible Penn Dissertations

The field of neuroimaging statistics is concerned with elucidating meaningful conclusions from high-dimensional imaging objects, often in the form of single-dimensioned summary statistics. Ideally, these summaries should provide interpretable biomarker measurements that can guide patient diagnoses or treatment decisions while minimizing information loss associated with dimension reduction. This dissertation is focused on (1) exploring methods for analyzing previously developed imaging biomarkers and (2) developing new imaging biomarkers using both well-established and novel imaging analysis techniques. We approach this problem in three ways: in our first project, we assess how previously developed imaging biomarkers can best be incorporated into downstream analyses …


Has Winter Weather In Southwest Ohio Been Affected By The El Niño Southern Oscillation, The North Atlantic Oscillation, The Pacific Decadal Oscillation, And The Atlantic Multidecadal Oscillation?, John A. Blue Jan 2022

Has Winter Weather In Southwest Ohio Been Affected By The El Niño Southern Oscillation, The North Atlantic Oscillation, The Pacific Decadal Oscillation, And The Atlantic Multidecadal Oscillation?, John A. Blue

Browse all Theses and Dissertations

Winter temperature and precipitation in Southwest Ohio over the last century were examined for anomalies attributable to teleconnections with large-scale atmospheric perturbations caused by the El Niño Southern Oscillation (ENSO), the North Atlantic Oscillation (NAO), the Pacific Decadal Oscillation (PDO), and the Atlantic Multidecadal Oscillation (AMO). The record of temperature gives evidence of a teleconnection with the NAO, ENSO, and PDO, with the strongest link being for phases of the NAO. Most winters during positive NAO phases had mean monthly temperature warmer than the century long mean, and the majority of negative NAO phase winters had colder temperatures. The difference …


Deep Learning And Uncertainty Quantification: Methodologies And Applications, Yibo Yang Jan 2022

Deep Learning And Uncertainty Quantification: Methodologies And Applications, Yibo Yang

Publicly Accessible Penn Dissertations

Uncertainty quantification is a recent emerging interdisciplinary area that leverages the power of statistical methods, machine learning models, numerical methods and data-driven approach to provide reliable inference for quantities of interest in natural science and engineering problems. In practice, the sources of uncertainty come from different aspects such as: aleatoric uncertainty where the uncertainty comes from the observations or is due to the stochastic nature of the problem; epistemic uncertainty where the uncertainty comes from inaccurate mathematical models, computational methods or model parametrization. Cope with the above different types of uncertainty, a successful and scalable model for uncertainty quantification requires …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


Analyzing Marriage Statistics As Recorded In The Journal Of The American Statistical Association From 1889 To 2012, Annalee Soohoo Jan 2022

Analyzing Marriage Statistics As Recorded In The Journal Of The American Statistical Association From 1889 To 2012, Annalee Soohoo

CMC Senior Theses

The United States has been tracking American marriage statistics since its founding. According to the United States Census Bureau, “marital status and marital history data help federal agencies understand marriage trends, forecast future needs of programs that have spousal benefits, and measure the effects of policies and programs that focus on the well-being of families, including tax policies and financial assistance programs.”[1] With such a wide scope of applications, it is understandable why marriage statistics are so highly studied and well-documented.

This thesis will analyze American marriage patterns over the past 100 years as documented in the Journal of …


Mary Eleanor Spear's Importance To The History Of Statistical Visualization, Melanie Williams Jan 2022

Mary Eleanor Spear's Importance To The History Of Statistical Visualization, Melanie Williams

CMC Senior Theses

This paper will demonstrate why Mary Eleanor Spear (1897-1986) is an important figure in the history of statistical visualization. She lead an impressive career working in the federal government as a data analyst before "data analyst" became a thing. She wrote and illustrated two comprehensive textbooks which furthered the art of statistical visualization. Her textbooks cover extensive graphing knowledge still valuable to statisticians and viewers today. Most notable of her works is her development of the box plot. In addition to Spear's career and contributions, this paper will also address the lack of female representation in science, technology, engineering, and …


Exploring Improvements To The Convergence Of Reconstructing Historical Destructive Earthquakes, Kameron Lightheart Nov 2021

Exploring Improvements To The Convergence Of Reconstructing Historical Destructive Earthquakes, Kameron Lightheart

Theses and Dissertations

Determining risk to human populations due to natural disasters has been a topic of interest in the STEM fields for centuries. Earthquakes and the tsunamis they cause are of particular interest due to their repetition cycles. These cycles can last hundreds of years but we have only had modern measuring instruments for the last century or so which makes analysis difficult. In this document, we explore ways to improve upon an existing method for reconstructing earthquakes from historical accounts of tsunamis. This method was designed and implemented by Jared P Whitehead's research group over the last 5 years. The issue …


The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi Oct 2021

The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi

Senior Theses

Basket neuronal cells of the mammalian neocortex have been classically categorized into two or more groups. Originally, it was thought that the large and small types are the naturally occurring groups that emerge from reasons that relate to neurobiological function and anatomical position. Later, a study based on anatomical and physiological features of these neurons introduced a third type, the net basket cell which is intermediate in size as compared to the large and small types. In this study, multivariate analysis was used to test the hypothesis that the large and small types are morphologically distinct groups. The results of …


Trade Bait: Season 3, Ben Bagley Oct 2021

Trade Bait: Season 3, Ben Bagley

WWU Honors College Senior Projects

A 5-episode podcast series dissecting the use of statistics in the NFL and NFL Media