Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Theses/Dissertations

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 299

Full-Text Articles in Physical Sciences and Mathematics

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda Nov 2023

Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda

LSU Master's Theses

This research examines tropical cyclone translation speed as a factor in storm tide and surge height upon landfall on the United States Gulf Coast. Understanding the effect of translation speed on peak storm tide/surge height is needed to better prepare for and predict future damage from tropical cyclone events. Tropical cyclone data are taken from hourly interpolated best-track HURDAT2 data from 1970–2021. This study uses the HURDAT2 hourly interpolated observation data points (24-hours) pre-landfall to landfall. Translation speed is calculated based on the distance traversed between hourly points. Peak storm tide and storm surge data are taken from SURGEDAT from …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu Jul 2023

Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu

Statistical Science Theses and Dissertations

In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.

There are different kinds of life-testing experiments that can be applied for different purposes. …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth May 2023

Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth

Statistical Science Theses and Dissertations

When conducting statistical analysis in the Bayesian paradigm, the most critical decision made by the researcher is the identification of a prior distribution for a parameter. Despite the mathematical soundness of the Bayesian approach, a wrongly specified prior can lead to biased and incorrect results. To avoid this, prior distributions should be based on real data, which are easily accessible in the "big data" era. This dissertation explores two applications of Bayesian hierarchical modelling that incorporate information obtained from a meta-analysis.

The first of these applications is in the normalization of genomics data, specifically for nanostring nCounter datasets. A meta-analysis …


Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu May 2023

Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu

Statistical Science Theses and Dissertations

In system reliability, practitioners may be interested in testing the homogeneity of the component lifetime distributions based on system lifetimes from multiple data sources for various reasons, such as identifying the component supplier that provides the most reliable components.

In the first part of the dissertation, we develop distribution-free hypothesis testing procedures for the homogeneity of the component lifetime distributions based on system lifetime data when the system structures are known. Several nonparametric testing statistics based on the empirical likelihood method are proposed for testing the homogeneity of two or more component lifetime distributions. The computational approaches to obtain the …


Using A Distributive Approach To Model Insurance Loss, Kayla Kippes Apr 2023

Using A Distributive Approach To Model Insurance Loss, Kayla Kippes

Student Research Submissions

Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …


Length Bias Estimation Of Small Businesses Lifetime, Simeng Li Apr 2023

Length Bias Estimation Of Small Businesses Lifetime, Simeng Li

Honors Theses

Small businesses, particularly restaurants, play a crucial role in the economy by generating employment opportunities, boosting tourism, and contributing to the local economy. However, accurately estimating their lifetimes can be challenging due to the presence of length bias, which occurs when the likelihood of sampling any particular restaurant's closure is influenced by its duration in operation. To address the issue, this study conducts goodness-of-fit tests on exponential/gamma family distributions and employs the Kaplan-Meier method to more accurately estimate the average lifetime of restaurants in Carytown. By providing insights into the challenges of estimating the lifetimes of small businesses, this study …


The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall Apr 2023

The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall

Honors Theses

One can see that this matrix is unitary and has eigenvalues {1,−i,−1, I}, each of infinite multiplicity. Throughout the remainder of this thesis, we will convince the reader that the above linear transformation is actually the Fourier transform. We will compute the commutant, as well as its invariant subspaces. The key to do this relies on the Hermite polynomials. Why do we recast the Fourier transform from its well-known and well studied integral form to the matrix form shown above? As we will see, the matrix form allows us to efficiently discover the operator theory of the Fourier transform obfuscated …


Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez Apr 2023

Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez

Statistical Science Theses and Dissertations

Influence diagnostics in regression analysis allow analysts to identify observations that have a strong influence on model fitted probabilities and parameter estimates. The most common influence diagnostics, such as Cook’s Distance for linear regression, are based on a deletion approach where the results of a model with and without observations of interest are compared. Here, deletion-based influence diagnostics are proposed for generalized estimating equations (GEE) for correlated, or clustered, nominal multinomial responses. The proposed influence diagnostics focus on GEEs with the baseline-category logit link function and a local odds ratio parameterization of the association structure. Formulas for both observation- and …


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …


Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley Apr 2023

Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley

Senior Theses

In Major League Baseball (MLB), the outcome of a stolen base attempt has important implications. Success moves the runner closer to scoring, while failure records an out and removes the runner from the basepaths altogether. Therefore, it is important that the decision by a coach or player to steal a base is well-informed. In this thesis, I explore a statistical approach to making this decision. I train logistic regression and random forest models, using data about the game situation and about the runner, pitcher, and catcher involved in the stolen base attempt, to estimate the probability that a stolen base …


Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong Dec 2022

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen Oct 2022

Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen

Theses

In this paper we explore Bayesian inference and its application to the problem of estimating the intensity function of a non-homogeneous Poisson process. These processes model the behavior of phenomena in which one or more events, known as arrivals, occur independently of one another over a certain period of time. We are concerned with the number of events occurring during particular time intervals across several realizations of the process. We show that given sufficient data, we are able to construct a piecewise-constant function which accurately estimates the mean rates on particular intervals. Further, we show that as we reduce these …


The Microscopical Evidence Traces Analysis Of Household Dust And Its Statistical Significance As A Definitive Identification Technique, Stephanie Polifroni Sep 2022

The Microscopical Evidence Traces Analysis Of Household Dust And Its Statistical Significance As A Definitive Identification Technique, Stephanie Polifroni

Dissertations, Theses, and Capstone Projects

Evidence found at crime scenes is used to assist in creating a link the suspect, the victim, and the scene. As stated by the Locard’s Principle, every contact leaves a trace, that evidence can be used to link together an investigation. Traces are collected in hopes that they can be identified and associated to an individual or individuals to help solve that particular crime. However, the strongest conclusion for evidence traces is an association to a source, and unless a physical match of some kind is found, an individualization cannot be established even when known sample is available. However, having …


Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun Aug 2022

Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun

Statistical Science Theses and Dissertations

Alternating recurrent events data arise commonly in health research; examples include hospital admissions and discharges of diabetes patients; exacerbations and remissions of chronic bronchitis; and quitting and restarting smoking. Recent work has involved formulating and estimating joint models for the recurrent event times considering non-negligible event durations. However, prediction models for transition between recurrent events are lacking. We consider the development and evaluation of methods for predicting future events within these models. Specifically, we propose a tool for dynamically predicting transition between alternating recurrent events in real time. Under a flexible joint frailty model, we derive the predictive probability of …


Efficient Approaches To Steady State Detection In Multivariate Systems, Honglun Xu Aug 2022

Efficient Approaches To Steady State Detection In Multivariate Systems, Honglun Xu

Open Access Theses & Dissertations

Steady state detection is critically important in many engineering fields such as fault detection and diagnosis, process monitoring and control. However, most of the existing methods are designed for univariate signals. In this dissertation, we proposed an efficient online steady state detection method for multivariate systems through a sequential Bayesian partitioning approach. The signal is modeled by a Bayesian piecewise constant mean and covariance model, and a recursive updating method is developed to calculate the posterior distributions analytically. The duration of the current segment is utilized to test the steady state. Insightful guidance is provided for hyperparameter selection. The effectiveness …


Characterizing Wildfire In The Frank Church Wilderness, Idaho, Between 1972-2012, Abigail Christine Axness Aug 2022

Characterizing Wildfire In The Frank Church Wilderness, Idaho, Between 1972-2012, Abigail Christine Axness

Boise State University Theses and Dissertations

I examined wildfire characteristics in the Frank Church Wilderness, central Idaho, between 1972-2012. Studying fire characteristics in the Frank Church Wilderness provides an opportunity to understand the history of wildfires in a federally designated wilderness area, largely devoid of management impacts with limited human access and activity. The ~958,000-hectare Frank Church Wilderness area encompasses the Middle Fork Salmon River. Vegetation cover ranges from high elevation (~2500-3200 meters) mixed conifer forests in the headwaters to low-elevation (~600-1000 meters) sagebrush-steppe and ponderosa pine (Pinus Ponderosa) forests. The Frank Church Wilderness is defined as unmanaged because effective fire suppression (e.g., vehicle …


Reconstructing Historical Earthquake-Induced Tsunamis: Case Study Of 1820 Event Near South Sulawesi, Indonesia, Taylor Jole Paskett Jul 2022

Reconstructing Historical Earthquake-Induced Tsunamis: Case Study Of 1820 Event Near South Sulawesi, Indonesia, Taylor Jole Paskett

Theses and Dissertations

We build on the method introduced by Ringer, et al., applying it to an 1820 event that happened near South Sulawesi, Indonesia. We utilize other statistical models to aid our Metropolis-Hastings sampler, including a Gaussian process which informs the prior. We apply the method to multiple possible fault zones to determine which fault is the most likely source of the earthquake and tsunami. After collecting nearly 80,000 samples, we find that between the two most likely fault zones, the Walanae fault zone matches the anecdotal accounts much better than Flores. However, to support the anecdotal data, both samplers tend toward …


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti May 2022

Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti

Honors Thesis

Machine learning is often used to build predictive models by extracting patterns from large data sets. Such techniques are increasingly being utilized to predict outcomes in the social sciences. One such application is predicting student success. Machine learning can be applied to predicting student acceptance and success in academia. Using these tools for education-related data analysis, may enable the evaluation of programs, resources and curriculum. Currently, research is needed to examine application, admissions, and retention data in order to address equity in college computer science programs. However, most student-level data sets contain sensitive data that cannot be made public. To …


The Efficacy Of The Covid-19 Vaccine In Mississippi, Ilyse Miriam Levy May 2022

The Efficacy Of The Covid-19 Vaccine In Mississippi, Ilyse Miriam Levy

Honors Theses

The Efficacy of The COVID-19 Vaccine in Mississippi

(Under the direction of Dr. Xin Dang)

By tracking and analyzing fifty-three weeks of COVID-19 data, this thesis analyzes the efficacy of the COVID-19 vaccine within the State of Mississippi. Over the course of these fifty-three weeks, I have also been able to calculate the confidence intervals for vaccination efficacy and the risk reduction due to vaccination by using data regarding the correlations between deaths and vaccination status, provided to me by the Mississippi Office of Epidemiology. My analysis demonstrates that the COVID-19 vaccine is effective not only in Mississippi but also …


An Examination Of The Statistics And Risk Management Concepts Behind The Patient Protection And Affordable Care Act (Ppaca) Of 2010, Scott Sinclair May 2022

An Examination Of The Statistics And Risk Management Concepts Behind The Patient Protection And Affordable Care Act (Ppaca) Of 2010, Scott Sinclair

Undergraduate Honors Thesis Collection

The Patient Protection and Affordable Care Act (PPACA) is the overarching federal law that has impacted the intricacies of the health insurance market for more than a decade. Using the supervised learning method of multiple linear regression, the relationship between the medical loss ratio rebates and predictor variables such as the state, health insurance market, and the number of insurance companies owing rebates will be analyzed, along with the actuarial value of metal tiers and geographic rating area factors in terms of their relationship to the insurance premium for a standard family of four, defined as a forty-year-old couple with …


Forecasting Razorback Baseball Game Outcomes, Austin Raabe May 2022

Forecasting Razorback Baseball Game Outcomes, Austin Raabe

Information Systems Undergraduate Honors Theses

Despite the disappointing end to the 2021 Arkansas Razorback baseball year, the team’s success provided hog fans something to look forward to next season. While they will be without the 2021 Golden Spikes Award winner, Kevin Kopps, and four All-SEC team selections, the 2022 roster has promising new and returning talent. With fifty percent of the players who played significant time last year coming back (minimum ten hits or ten innings pitched), the arrival of several impact transfers from major conferences, and a recruiting class ranked in the top five according to Perfect Game, there is reason to believe that …


Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams May 2022

Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams

Political Science Undergraduate Honors Theses

In an effort to increase the accuracy of statewide political polling in Arkansas, we explore the statistical strategy of weighting with a focus on one yearly opinion poll: The Arkansas Poll. We conduct over 70 weighting experiments on the 2016 and 2020 Arkansas Polls using a variety of variables and opinion questions. From these experiments, we find that while some weighted variables tend to create larger changes, weighting typically results in a single-digit percentage change that does not substantially shift or “flip” the majorities. Due to a greater rate of change through weighting in the 2020 Poll compared to the …


Analytical Study To Determine Significant Causes Of Increased No-Hitters In The 2021 Major League Baseball Season, Joel Robison Apr 2022

Analytical Study To Determine Significant Causes Of Increased No-Hitters In The 2021 Major League Baseball Season, Joel Robison

Honors Projects

Why were there so many no-hitters in the 2021 MLB season? This project focuses on possible significant causes to the record-breaking number of no-hitters pitched in the 2021 Major League Baseball season. Specifically, this project takes an analytical look at the recent trends in launch angles and spin rates to determine if there are any significant causes to the increased number of no-hitters in baseball. The random nature and unpredictability of the game of baseball make it almost impossible to come to any solid conclusions.


Mixture Models In Machine Learning, Soumyabrata Pal Mar 2022

Mixture Models In Machine Learning, Soumyabrata Pal

Doctoral Dissertations

Modeling with mixtures is a powerful method in the statistical toolkit that can be used for representing the presence of sub-populations within an overall population. In many applications ranging from financial models to genetics, a mixture model is used to fit the data. The primary difficulty in learning mixture models is that the observed data set does not identify the sub-population to which an individual observation belongs. Despite being studied for more than a century, the theoretical guarantees of mixture models remain unknown for several important settings. In this thesis, we look at three groups of problems. The first part …


Many-Objective Evolutionary Algorithms: Objective Reduction, Decomposition And Multi-Modality., Monalisa Pal Dr. Jan 2022

Many-Objective Evolutionary Algorithms: Objective Reduction, Decomposition And Multi-Modality., Monalisa Pal Dr.

Doctoral Theses

Evolutionary Algorithms (EAs) for Many-Objective Optimization (MaOO) problems are challenging in nature due to the requirement of large population size, difficulty in maintaining the selection pressure towards global optima and inability of accurate visualization of high-dimensional Pareto-optimal Set (in decision space) and Pareto-Front (in objective space). The quality of the estimated set of Pareto-optimal solutions, resulting from the EAs for MaOO problems, is assessed in terms of proximity to the true surface (convergence) and uniformity and coverage of the estimated set over the true surface (diversity). With more number of objectives, the challenges become more profound. Thus, better strategies have …