Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 555

Full-Text Articles in Entire DC Network

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda Nov 2023

Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda

LSU Master's Theses

This research examines tropical cyclone translation speed as a factor in storm tide and surge height upon landfall on the United States Gulf Coast. Understanding the effect of translation speed on peak storm tide/surge height is needed to better prepare for and predict future damage from tropical cyclone events. Tropical cyclone data are taken from hourly interpolated best-track HURDAT2 data from 1970–2021. This study uses the HURDAT2 hourly interpolated observation data points (24-hours) pre-landfall to landfall. Translation speed is calculated based on the distance traversed between hourly points. Peak storm tide and storm surge data are taken from SURGEDAT from …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore Sep 2023

Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore

Dissertations, Theses, and Capstone Projects

We investigate the properties of natural language prompts that determine their difficulty in machine reading comprehension tasks. While much work has been done benchmarking language model performance at the task level, there is considerably less literature focused on how individual task items can contribute to interpretable evaluations of natural language understanding. Such work is essential to deepening our understanding of language models and ensuring their responsible use as a key tool in human machine communication. We perform an in depth mixed effects analysis on the behavior of three major generative language models, comparing their performance on a large reading comprehension …


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu Jul 2023

Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu

Statistical Science Theses and Dissertations

In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.

There are different kinds of life-testing experiments that can be applied for different purposes. …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth May 2023

Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth

Statistical Science Theses and Dissertations

When conducting statistical analysis in the Bayesian paradigm, the most critical decision made by the researcher is the identification of a prior distribution for a parameter. Despite the mathematical soundness of the Bayesian approach, a wrongly specified prior can lead to biased and incorrect results. To avoid this, prior distributions should be based on real data, which are easily accessible in the "big data" era. This dissertation explores two applications of Bayesian hierarchical modelling that incorporate information obtained from a meta-analysis.

The first of these applications is in the normalization of genomics data, specifically for nanostring nCounter datasets. A meta-analysis …


Contributions To Causal Inference In Observational Studies, Jenny Park, Daniel F. Heitjan, Christy Boling Turer May 2023

Contributions To Causal Inference In Observational Studies, Jenny Park, Daniel F. Heitjan, Christy Boling Turer

Statistical Science Theses and Dissertations

The electronic health record (EHR) is a digital version of the patient chart. All clinically relevant patient information can be accessed from the EHR by professionals involved in the patient’s care. For researchers, the EHR is a rich, convenient source for data to address a vast range of medical research questions.

In observational studies with EHR data, it is common to define the treatment/exposure status as a binary indicator reflecting whether patient was documented to receive a particular medication or procedure. The outcome can be any type of information on patient status documented in the EHR after the treatment has …


Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu May 2023

Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu

Statistical Science Theses and Dissertations

In system reliability, practitioners may be interested in testing the homogeneity of the component lifetime distributions based on system lifetimes from multiple data sources for various reasons, such as identifying the component supplier that provides the most reliable components.

In the first part of the dissertation, we develop distribution-free hypothesis testing procedures for the homogeneity of the component lifetime distributions based on system lifetime data when the system structures are known. Several nonparametric testing statistics based on the empirical likelihood method are proposed for testing the homogeneity of two or more component lifetime distributions. The computational approaches to obtain the …


Special Cases In Estimating Multiple Missing Values In Linear Models, Aaron Christopher Marshall May 2023

Special Cases In Estimating Multiple Missing Values In Linear Models, Aaron Christopher Marshall

Undergraduate Honors Thesis Collection

Missing a single observation, or more, very commonly occurs in observational and designed studies. Estimating a single missing observation and analysis of these types of data is found in literature (Montgomery, 2020). But the estimation and analysis of data becomes more complicated when the study dataset becomes imbalanced due to multiple missing. Though the case of two missing values is the simplest case of multiple missing values, the analytic estimation and analysis will not be as straight forward as in the one missing value case, because two missing can occur in various ways. This thesis will be exploring mainly the …


Interaction Effects And Selecting Regression Models Of Taylor Swift Song Popularity, Halle Schneidewind May 2023

Interaction Effects And Selecting Regression Models Of Taylor Swift Song Popularity, Halle Schneidewind

Industrial Engineering Undergraduate Honors Theses

Understanding music popularity and what drives it is important not only for artists but for other individuals who are financially tied to music sales including producers, writers, and record labels. Studies have been done to define how a song’s popularity can be measured, what attributes or features are drivers for popularity, and to what extent can a song’s popularity even be predicted. This paper takes two linear regression approaches to predicting the popularity of a Taylor Swift song on Spotify based on auditory features the Spotify API estimates and historic popularity of songs on Spotify. One model takes into consideration …


Vietnam’S Gdp: Re-Assessing Growth Rate And Identifying An Alternative Indicator, My Linh D. Nguyen Apr 2023

Vietnam’S Gdp: Re-Assessing Growth Rate And Identifying An Alternative Indicator, My Linh D. Nguyen

Honors Theses

Since the economic reform known as Doi Moi (Renovation) in 1986, Vietnam has changed from one of the world’s poorest to a middle-income country in one generation (USAID, 2022). The country has consistently registered high and stable economic growth since the reform, averaging 6.3% from 1985 to 2021 (World Bank, 2022). High growth rate of gross domestic product (GDP) is good news, but it has also raised questions that go both ways. On one side, there is much speculation that the government of Vietnam has manipulated economic statistics, compared to the case of China and India. As quoted in Kinh …


Using A Distributive Approach To Model Insurance Loss, Kayla Kippes Apr 2023

Using A Distributive Approach To Model Insurance Loss, Kayla Kippes

Student Research Submissions

Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …


Length Bias Estimation Of Small Businesses Lifetime, Simeng Li Apr 2023

Length Bias Estimation Of Small Businesses Lifetime, Simeng Li

Honors Theses

Small businesses, particularly restaurants, play a crucial role in the economy by generating employment opportunities, boosting tourism, and contributing to the local economy. However, accurately estimating their lifetimes can be challenging due to the presence of length bias, which occurs when the likelihood of sampling any particular restaurant's closure is influenced by its duration in operation. To address the issue, this study conducts goodness-of-fit tests on exponential/gamma family distributions and employs the Kaplan-Meier method to more accurately estimate the average lifetime of restaurants in Carytown. By providing insights into the challenges of estimating the lifetimes of small businesses, this study …


The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall Apr 2023

The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall

Honors Theses

One can see that this matrix is unitary and has eigenvalues {1,−i,−1, I}, each of infinite multiplicity. Throughout the remainder of this thesis, we will convince the reader that the above linear transformation is actually the Fourier transform. We will compute the commutant, as well as its invariant subspaces. The key to do this relies on the Hermite polynomials. Why do we recast the Fourier transform from its well-known and well studied integral form to the matrix form shown above? As we will see, the matrix form allows us to efficiently discover the operator theory of the Fourier transform obfuscated …


Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez Apr 2023

Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez

Statistical Science Theses and Dissertations

Influence diagnostics in regression analysis allow analysts to identify observations that have a strong influence on model fitted probabilities and parameter estimates. The most common influence diagnostics, such as Cook’s Distance for linear regression, are based on a deletion approach where the results of a model with and without observations of interest are compared. Here, deletion-based influence diagnostics are proposed for generalized estimating equations (GEE) for correlated, or clustered, nominal multinomial responses. The proposed influence diagnostics focus on GEEs with the baseline-category logit link function and a local odds ratio parameterization of the association structure. Formulas for both observation- and …


Bayesian Methods For Random-Effects Meta-Analysis Of Rare Binary Events In Biomedical Research, Ming Zhang Apr 2023

Bayesian Methods For Random-Effects Meta-Analysis Of Rare Binary Events In Biomedical Research, Ming Zhang

Statistical Science Theses and Dissertations

Rare binary events data arise frequently in medical research. Due to lack of statistical power in individual studies involving such data, meta-analysis has become an increasingly important tool for combining results from multiple independent studies. However, traditional meta-analysis methods often report severely biased estimates in such rare-event settings. Moreover, many rely on models assuming a pre-specified direction for variability between control and treatment groups for mathematical convenience, which may be violated in practice. In Chapter 1, based on a flexible random-effects model that removes the assumption about the direction, we propose new Bayesian procedures for estimating and testing the overall …


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …


Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley Apr 2023

Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley

Senior Theses

In Major League Baseball (MLB), the outcome of a stolen base attempt has important implications. Success moves the runner closer to scoring, while failure records an out and removes the runner from the basepaths altogether. Therefore, it is important that the decision by a coach or player to steal a base is well-informed. In this thesis, I explore a statistical approach to making this decision. I train logistic regression and random forest models, using data about the game situation and about the runner, pitcher, and catcher involved in the stolen base attempt, to estimate the probability that a stolen base …


Towards Scalable Mental Health: Leveraging Digital Tools In Combination With Computational Modeling To Aid In Treatment And Assessment Of Major Depressive Disorder, Matthew D. Nemesure Mar 2023

Towards Scalable Mental Health: Leveraging Digital Tools In Combination With Computational Modeling To Aid In Treatment And Assessment Of Major Depressive Disorder, Matthew D. Nemesure

Dartmouth College Ph.D Dissertations

Major depressive disorder (MDD) is a debilitating disorder that impacts the lives of nearly 280 million individuals worldwide, representing 5% of the overall adult population. Unfortunately, these statistics have been both trending upward and are also likely an underestimate. This can be primarily attributed to lack of screening paired with a lack of providers. Worldwide, there are roughly 450 individuals living with MDD per mental health care provider. Adding to this burden, approximately half of affected individuals that do receive care of any kind will fail to remain in remission. The goal of this thesis work is to leverage statistical …


The Impact Of Using Relevant Context On Student Comprehension And Attitude In A Collegiate Introductory Statistics Unit On Probability, Ryan F. Rzeszutko, Jennifer L. Petrie, Dwayne T. James Feb 2023

The Impact Of Using Relevant Context On Student Comprehension And Attitude In A Collegiate Introductory Statistics Unit On Probability, Ryan F. Rzeszutko, Jennifer L. Petrie, Dwayne T. James

Dissertations

The typical collegiate introductory statistics course poses significant challenges for students. Many do not fully comprehend key course skills, and it is common for students to exit the class with a neutral or negative attitude toward statistics. To measure the impact of using relevant contextual examples as an instructional strategy during a probability unit, in-class activities were designed to align with areas of interest for participants as identified by a student interest inventory. It was hypothesized that the use of relevant context would create a significant difference in the comprehension or attitude of students enrolled in an introductory statistics course …


Uncovering The Role Of Fat-Infiltrated Axillary Lymph Nodes In Obesity-Related Diseases With Statistical And Machine Learning Analyses, Qingyuan Song Jan 2023

Uncovering The Role Of Fat-Infiltrated Axillary Lymph Nodes In Obesity-Related Diseases With Statistical And Machine Learning Analyses, Qingyuan Song

Dartmouth College Ph.D Dissertations

The link between obesity and pathogenesis is a complex and multifaceted area of research that is yet to be fully understood. Ample evidence exists to demonstrate the direct relationship between excessive internal fat and various health conditions such as cancer, and metabolic and cardiovascular diseases. The infiltration of ectopic fat into axillary lymph nodes, observable on breast cancer screening images, has been shown to be correlated with body mass index (BMI) in women undergoing screening. This study aimed to explore the relationship between fat-infiltrated axillary lymph nodes (FIN) and obesity-related diseases, with the goal of evaluating the clinical value of …


Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong Dec 2022

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Study Of Stochastic Market Clearing Problems In Power Systems With High Renewable Integration, Saumya Sakitha Sashrika Ariyarathne Oct 2022

Study Of Stochastic Market Clearing Problems In Power Systems With High Renewable Integration, Saumya Sakitha Sashrika Ariyarathne

Operations Research and Engineering Management Theses and Dissertations

Integrating large-scale renewable energy resources into the power grid poses several operational and economic problems due to their inherently stochastic nature. The lack of predictability of renewable outputs deteriorates the power grid’s reliability. The power system operators have recognized this need to account for uncertainty in making operational decisions and forming electricity pricing. In this regard, this dissertation studies three aspects that aid large-scale renewable integration into power systems. 1. We develop a nonparametric change point-based statistical model to generate scenarios that accurately capture the renewable generation stochastic processes; 2. We design new pricing mechanisms derived from alternative stochastic programming …


Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen Oct 2022

Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen

Theses

In this paper we explore Bayesian inference and its application to the problem of estimating the intensity function of a non-homogeneous Poisson process. These processes model the behavior of phenomena in which one or more events, known as arrivals, occur independently of one another over a certain period of time. We are concerned with the number of events occurring during particular time intervals across several realizations of the process. We show that given sufficient data, we are able to construct a piecewise-constant function which accurately estimates the mean rates on particular intervals. Further, we show that as we reduce these …


The Microscopical Evidence Traces Analysis Of Household Dust And Its Statistical Significance As A Definitive Identification Technique, Stephanie Polifroni Sep 2022

The Microscopical Evidence Traces Analysis Of Household Dust And Its Statistical Significance As A Definitive Identification Technique, Stephanie Polifroni

Dissertations, Theses, and Capstone Projects

Evidence found at crime scenes is used to assist in creating a link the suspect, the victim, and the scene. As stated by the Locard’s Principle, every contact leaves a trace, that evidence can be used to link together an investigation. Traces are collected in hopes that they can be identified and associated to an individual or individuals to help solve that particular crime. However, the strongest conclusion for evidence traces is an association to a source, and unless a physical match of some kind is found, an individualization cannot be established even when known sample is available. However, having …


Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun Aug 2022

Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun

Statistical Science Theses and Dissertations

Alternating recurrent events data arise commonly in health research; examples include hospital admissions and discharges of diabetes patients; exacerbations and remissions of chronic bronchitis; and quitting and restarting smoking. Recent work has involved formulating and estimating joint models for the recurrent event times considering non-negligible event durations. However, prediction models for transition between recurrent events are lacking. We consider the development and evaluation of methods for predicting future events within these models. Specifically, we propose a tool for dynamically predicting transition between alternating recurrent events in real time. Under a flexible joint frailty model, we derive the predictive probability of …


Efficient Approaches To Steady State Detection In Multivariate Systems, Honglun Xu Aug 2022

Efficient Approaches To Steady State Detection In Multivariate Systems, Honglun Xu

Open Access Theses & Dissertations

Steady state detection is critically important in many engineering fields such as fault detection and diagnosis, process monitoring and control. However, most of the existing methods are designed for univariate signals. In this dissertation, we proposed an efficient online steady state detection method for multivariate systems through a sequential Bayesian partitioning approach. The signal is modeled by a Bayesian piecewise constant mean and covariance model, and a recursive updating method is developed to calculate the posterior distributions analytically. The duration of the current segment is utilized to test the steady state. Insightful guidance is provided for hyperparameter selection. The effectiveness …


Characterizing Wildfire In The Frank Church Wilderness, Idaho, Between 1972-2012, Abigail Christine Axness Aug 2022

Characterizing Wildfire In The Frank Church Wilderness, Idaho, Between 1972-2012, Abigail Christine Axness

Boise State University Theses and Dissertations

I examined wildfire characteristics in the Frank Church Wilderness, central Idaho, between 1972-2012. Studying fire characteristics in the Frank Church Wilderness provides an opportunity to understand the history of wildfires in a federally designated wilderness area, largely devoid of management impacts with limited human access and activity. The ~958,000-hectare Frank Church Wilderness area encompasses the Middle Fork Salmon River. Vegetation cover ranges from high elevation (~2500-3200 meters) mixed conifer forests in the headwaters to low-elevation (~600-1000 meters) sagebrush-steppe and ponderosa pine (Pinus Ponderosa) forests. The Frank Church Wilderness is defined as unmanaged because effective fire suppression (e.g., vehicle …