Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

Theses/Dissertations

Statistics

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 584

Full-Text Articles in Entire DC Network

Bayesian Variational Inference In Keyword Identification And Multiple Instance Classification, Yaofang Hu Aug 2024

Bayesian Variational Inference In Keyword Identification And Multiple Instance Classification, Yaofang Hu

Statistical Science Theses and Dissertations

This dissertation investigates (1) Variational Bayesian Semi-supervised Keyword Extraction and (2) Variational Bayesian Multimodal Multiple Instance Classification.

The expansion of textual data, stemming from various sources such as online product reviews and scholarly publications on scientific discoveries, has created a demand for the extraction of succinct yet comprehensive information. As a result, in recent years, efforts have been spent in developing novel methodologies for keyword extraction. Although many methods have been proposed to automatically extract keywords in the contexts of both unsupervised and fully supervised learning, how to effectively use partially observed keywords, such as author-specified keywords, remains an under-explored …


A Multi-Metric Approach To Fay-Herriot Small Area Estimation Of Forests, Zachary Dorminey Aug 2024

A Multi-Metric Approach To Fay-Herriot Small Area Estimation Of Forests, Zachary Dorminey

Masters Theses

Forest managers are tasked with decisions regarding silvicultural practices that require detailed information about the environments they serve. Managing complex structures like forests demands consideration of many interrelated variables characterizing the overall condition of a forest. Success in these management initiatives includes not only maximum production from the timber assets, but also proof that these operations accord with modern sustainable practices. Small area estimates obtained from a National Forest Inventory (NFI) dataset lack necessary statistical certainty due to a relatively small sample of forest plots. These inventory datasets are spatially sparse, yet attribute-rich. Given these properties, research efforts in this …


Exploring Healthcare Chatbot Information Presentation: Applying Hierarchical Bayesian Regression And Inductive Thematic Analysis In A Mixed Methods Study, Samuel Nelson Koscelny Aug 2024

Exploring Healthcare Chatbot Information Presentation: Applying Hierarchical Bayesian Regression And Inductive Thematic Analysis In A Mixed Methods Study, Samuel Nelson Koscelny

All Theses

High blood pressure, also known as hypertension, significantly increases the risk of heart disease and stroke, which are leading causes of death in the United States. While contributing to over 691,000 deaths in 2021 alone in the United States (U.S.), it also imposes immense economic burden on the healthcare system, costing approximately $131 billion annually. One way to address this issue is for increased self-care behaviors and medication adherence, both of which require sufficient health literacy. Despite the importance of health literacy, 90% of U.S. adults struggle with health-related subjects. Overcoming the issues associated with health literacy requires addressing the …


Unraveling The History Of Deforestation In The Amazon Rainforest With Statistical Modeling, Ryan Destefano Jun 2024

Unraveling The History Of Deforestation In The Amazon Rainforest With Statistical Modeling, Ryan Destefano

Master's Theses

The Amazon rainforest, a vital ecosystem of immense biodiversity and global climate significance, faces the ongoing threat of deforestation driven by agricultural expansion. This thesis employs remote sensing techniques, focusing on the Enhanced Vegetation Index (EVI) derived from Landsat satellite imagery, to track land cover dynamics within the Amazon. The study examines historical land cover changes in current plantations in Peru and Brazil, regions where the exact timing of deforestation is uncertain. By analyzing EVI measurements dating back to 1984, inflection points indicative of deforestation events preceding plantation establishment are identified. Statistical modeling techniques, including spline fitting to analyze time …


The Impact Of Video Assistant Referee (Var) On The English Premier League, Jack Kenyon Brown Jun 2024

The Impact Of Video Assistant Referee (Var) On The English Premier League, Jack Kenyon Brown

Master's Theses

The aim of this study is to examine how the introduction of the Video Assisted Referee (VAR) system influenced the English Premier League (EPL). Since its implementation in the English Premier League in 2019, VAR has been a constant source of debate and controversy. Many studies have been done on the immediate impact of VAR on other elite professional soccer leagues, but the scope of results is very limited and due to be updated. The data for the ensuing analysis consists of 3800 matches played in the English Premier League during the five seasons before (14/15, 15/16, 16/17, 17/18, and …


An Investigation Into Teaching Sports Analytics, Josh Havstad Jun 2024

An Investigation Into Teaching Sports Analytics, Josh Havstad

Master's Theses

Sports analytics arrived in the mainstream media through the novel and film Moneyball. However, its origins date back to operations researchers following World War II. Often considered a subdiscipline of statistics, sports analytics draws from statistics but also includes concepts from data science, communication, and marketing. As a passionate fan of sports, I have pursued statistics in my undergraduate and graduate education with the dream of working in sports for my career. However, educational opportunities in sports analytics are limited nationwide, and more specifically, there is no educational opportunity at my university, California Polytechnic State University in San Luis …


Recursive Marix Game Analysis: Optimal, Simplified, And Human Strategies In Brave Rats, William A. Medwid Jun 2024

Recursive Marix Game Analysis: Optimal, Simplified, And Human Strategies In Brave Rats, William A. Medwid

Master's Theses

Brave Rats is a short game with simple rules, yet establishing a comprehensive strategy is very challenging without extensive computation. After explaining the rules, this paper begins by calculating the optimal strategy by recursively solving each turn’s Minimax strategy. It then provides summary statistics about the complex, branching Minimax solution. Next, we examine six other strategy models and evaluate their performance against each other. These models’ flaws highlight the key elements that contribute to the effectiveness of the Minimax strategy and offer insight into simpler strategies that human players could mimic. Finally, we analyze 123 games of human data collected …


Recidivism In Mississippi: Causes, Impacts, And Solutions, Grace E. Brian May 2024

Recidivism In Mississippi: Causes, Impacts, And Solutions, Grace E. Brian

Honors Theses

Because the United States is home to the largest prison population in the world, finding solutions to reduce the rate at which prisoners return to prison is paramount to helping reduce crime. Assessments of how Mississippi, the state with the highest incarceration rate, engages with access to prison education, barriers to employment, and youth incarceration compared to the national, Southern, and non-southern averages will be explored to direct recidivism reduction solutions. Results showed that Mississippi had a slightly higher recidivism rate than the national average, had fewer barriers to employment, and a lower youth incarceration rate than the national average …


Descriptions Of Interglacial Mastodons From Snowmass, Colorado, Connor White May 2024

Descriptions Of Interglacial Mastodons From Snowmass, Colorado, Connor White

Electronic Theses and Dissertations

The Ziegler Reservoir fossil site (ZRFS) in Colorado contains over 4000 mastodon bones that date from 140,000 to 100,000 years ago. At an elevation of ~2705 meters above sea level, ZRFS represents an alpine ecosystem dated to Marine Isotope Stage (MIS) 5. Formal descriptions of cheek teeth, mandibles, crania, and femora were completed. Statistical analyses of the upper and lower third molars, including a novel measurement of interloph(id) distances, indicate significant differences between ZRFS mastodons and Mammut pacificus, while falling within the ranges for Mammut americanum. This study agrees with the taxonomic assignment of ZRFS mastodons to Mammut …


Assessing Extant Methods For Generating G-Optimal Designs And A Novel Methodology To Compute The G-Score Of A Candidate Design, Hyrum John Hansen May 2024

Assessing Extant Methods For Generating G-Optimal Designs And A Novel Methodology To Compute The G-Score Of A Candidate Design, Hyrum John Hansen

All Graduate Theses and Dissertations, Fall 2023 to Present

Experimental designs are used by scientists to allocate treatments such that statistical inference is appropriate. Most traditional experimental designs have mathematical properties that make them desirable under certain conditions. Optimal experimental designs are those where the researcher can exercise total control over the treatment levels to maximize a chosen mathematical property. As is common in literature, the experimental design is represented as a matrix where each column represents a variable, and each row represents a trial. We define a function that takes as input the design matrix and outputs its score. We then algorithmically adjust each entry until a design …


A Statistical Look Into How Common Soccer Metrics Influence Expected Goal Measures In The Professional Game, Tristan George Rumsey May 2024

A Statistical Look Into How Common Soccer Metrics Influence Expected Goal Measures In The Professional Game, Tristan George Rumsey

Undergraduate Honors Thesis Collection

The advent of sports analytics has ignited a fervor across all sporting disciplines, particularly soccer, where clubs are sprinting to harness vast data reserves to elevate team performance, spearhead effective marketing endeavors, and bolster financial gains crucial for club expansion. Much like Billy Beane's transformative "Moneyball" approach, soccer clubs are in pursuit of innovative strategies to transcend financial limitations and achieve triumph. In soccer, where goals are scarce commodities, heightened offensive efficacy becomes imperative. Presently, one metric stands out as pivotal in gauging a team's goal-scoring success: expected goals (xG). This metric quantifies the likelihood of a given shot or …


A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson Apr 2024

A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson

Honors College Theses

Over the previous 20 years, the software development industry has overseen an evolution in application of Version Control Systems (VCS) from a Centralized Version Control System (CVCS) format to a Decentralized Version Control Format (DVCS). Examples of the former include Perforce and Subversion whilst the latter of the two include Github and BitBucket. As DVCS models allow software contributors to maintain their respective local repositories of relevant code bases, developers are able to work offline and maintain their work with relative fault tolerance. This contrasts to CVCS models, which require software contributors to be connected online to a main server. …


Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles Jan 2024

Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles

Honors Theses and Capstones

With the analytics revolution in sports in the past 20 years, it seems that everything that can be quantified is. In basketball though, trying to break the game down into a set of numbers comes with a unique problem. While we've come up with a good set of advanced numbers to measure offensive efficiency, defense is fundamentally harder to quantify. The game is played five on five, but it has often been popular or convenient to model defense as a set of five one on one games. As defenses became more complex into the 2010s, this methodology became more insignificant. …


Enhancing Student Graduation Rates By Mitigating Failure, Dropout, And Withdrawal In Introduction To Statistical Courses Using Statistical And Machine Learning, Shahabeddin Abbaspour Tazehkand Jan 2024

Enhancing Student Graduation Rates By Mitigating Failure, Dropout, And Withdrawal In Introduction To Statistical Courses Using Statistical And Machine Learning, Shahabeddin Abbaspour Tazehkand

Graduate Thesis and Dissertation 2023-2024

The elevated rates of failure, dropout, and withdrawal (FDW) in introductory statistics courses pose a significant barrier to students' timely graduation from college. Identifying actionable strategies to support instructors in facilitating student success by reducing FDW rates is paramount. This thesis undertakes a comprehensive approach, leveraging various machine learning algorithms to address this pressing issue. Drawing from three years of data from an introductory statistics course at one of the largest universities in the USA, this study examines the problem in depth. Numerous predictive classification models have been developed, showcasing the efficacy of machine learning techniques in this context. Actionable …


Maximum Entropy Species Distribution Modeling For The Spring Ephemeral Herb Bloodroot (Sanguinaria Canadensis) In Eastern North America, Velan Manivannan Jan 2024

Maximum Entropy Species Distribution Modeling For The Spring Ephemeral Herb Bloodroot (Sanguinaria Canadensis) In Eastern North America, Velan Manivannan

Williams Honors College, Honors Research Projects

The spring ephemeral plant Bloodroot (Sanguinaria canadensis) has a widespread native range in North America, spanning much of the eastern United States and Canada. While its current NatureServe conservation status is designated as ‘secure’ (NatureServe, 2023), its status as a spring ephemeral places it at a heightened risk for climate change-induced phenological mismatch with advancing forest canopy closure. Additionally, under continued anthropogenic climate change, Bloodroot may also experience range shifts or contractions as the edges of its present range warm past physiological thresholds. To determine the potential for range shifts and contractions under future warming, I generated a …


Ensemble Classification: An Analysis Of The Random Forest Model, Jarod Korn Jan 2024

Ensemble Classification: An Analysis Of The Random Forest Model, Jarod Korn

Williams Honors College, Honors Research Projects

The random forest model proposed by Dr. Leo Breiman in 2001 is an ensemble machine learning method for classification prediction and regression. In the following paper, we will conduct an analysis on the random forest model with a focus on how the model works, how it is applied in software, and how it performs on a set of data. To fully understand the model, we will introduce the concept of decision trees, give a summary of the CART model, explain in detail how the random forest model operates, discuss how the model is implemented in software, demonstrate the model by …


Forecasting The Outcome Of Nfl Playoff Games: A Regression Analysis, Jack Pierpont Morgan V Jan 2024

Forecasting The Outcome Of Nfl Playoff Games: A Regression Analysis, Jack Pierpont Morgan V

UVM Patrick Leahy Honors College Senior Theses

Professional sports are one of the most consumed forms of entertainment in the world today. Professional sporting events are some of the most watched broadcasts worldwide each year. The 2022 FIFA World Cup Final garnered about 1.5 billion views worldwide, almost 20% of our planet’s population (Jones, 2023). The National Football League is the most popular professional sport in the United States. Recent polling data shows that a clear majority of the country, 72% of Americans, self-identify as football fans (“St. Bonaventure”, 2023). The NFL runs from September to February and regularly draws 15-20 million viewers weekly during …


Statistically Principled Deep Learning For Sar Image Segmentation, Cassandra Goldberg Jan 2024

Statistically Principled Deep Learning For Sar Image Segmentation, Cassandra Goldberg

Honors Projects

This project explores novel approaches for Synthetic Aperture Radar (SAR) image segmentation that integrate established statistical properties of SAR into deep learning models. First, Perlin Noise and Generalized Gamma distribution sampling methods were utilized to generate a synthetic dataset that effectively captures the statistical attributes of SAR data. Subsequently, deep learning segmentation architectures were developed that utilize average pooling and 1x1 convolutions to perform statistical moment computations. Finally, supervised and unsupervised disparity-based losses were incorporated into model training. The experimental outcomes yielded promising results: the synthetic dataset effectively trained deep learning models for real SAR data segmentation, the statistically-informed architectures …


What Fishing Tackle Should I Bring Today?: Safety Harbor Resource Collection Tools As Adaptations To Aquatic Environments, Richard J. Davis Iii Jan 2024

What Fishing Tackle Should I Bring Today?: Safety Harbor Resource Collection Tools As Adaptations To Aquatic Environments, Richard J. Davis Iii

Graduate Thesis and Dissertation 2023-2024

This thesis reports on the results of research conducted to determine whether technological adaptations to local environmental conditions can be observed through geospatial and artifact analysis of Safety Harbor collections from the Tampa Bay region of Florida. Past artifact and spatial analysis did not take advantage of modern technological advancements when studying how human-environmental interactions can influence certain adaptations to local conditions. In this project, GIS was used to reconstruct local aquatic environmental conditions of waterways adjacent to Safety Harbor sites. New statistical software programs have also proven themselves useful to archaeologists seeking to conduct hypothesis testing of artifact data. …


Radiofrequency Interference Detection Using Lstmand Statistical Analysis Discriminator, Luke Smith Jan 2024

Radiofrequency Interference Detection Using Lstmand Statistical Analysis Discriminator, Luke Smith

Masters Theses

"Wireless devices are becoming increasingly pervasive across all aspects of society. Examples of such devices include radios, routers, mobile phones, tablets, and more. As the number of radio frequency (RF) devices continues to rise, so does the amount of interference and noise increase. This is why an efficient approach to interference detection is explored. Most research within this area has been done strictly within the frequency domain as viewing a signal within this domain provides many insights into what makes the signal. This has, however, led to the time domain being underutilized for this area of research.

To explore the …


Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda Nov 2023

Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda

LSU Master's Theses

This research examines tropical cyclone translation speed as a factor in storm tide and surge height upon landfall on the United States Gulf Coast. Understanding the effect of translation speed on peak storm tide/surge height is needed to better prepare for and predict future damage from tropical cyclone events. Tropical cyclone data are taken from hourly interpolated best-track HURDAT2 data from 1970–2021. This study uses the HURDAT2 hourly interpolated observation data points (24-hours) pre-landfall to landfall. Translation speed is calculated based on the distance traversed between hourly points. Peak storm tide and storm surge data are taken from SURGEDAT from …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore Sep 2023

Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore

Dissertations, Theses, and Capstone Projects

We investigate the properties of natural language prompts that determine their difficulty in machine reading comprehension tasks. While much work has been done benchmarking language model performance at the task level, there is considerably less literature focused on how individual task items can contribute to interpretable evaluations of natural language understanding. Such work is essential to deepening our understanding of language models and ensuring their responsible use as a key tool in human machine communication. We perform an in depth mixed effects analysis on the behavior of three major generative language models, comparing their performance on a large reading comprehension …


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu Jul 2023

Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu

Statistical Science Theses and Dissertations

In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.

There are different kinds of life-testing experiments that can be applied for different purposes. …


Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu May 2023

Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu

Statistical Science Theses and Dissertations

In system reliability, practitioners may be interested in testing the homogeneity of the component lifetime distributions based on system lifetimes from multiple data sources for various reasons, such as identifying the component supplier that provides the most reliable components.

In the first part of the dissertation, we develop distribution-free hypothesis testing procedures for the homogeneity of the component lifetime distributions based on system lifetime data when the system structures are known. Several nonparametric testing statistics based on the empirical likelihood method are proposed for testing the homogeneity of two or more component lifetime distributions. The computational approaches to obtain the …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth May 2023

Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth

Statistical Science Theses and Dissertations

When conducting statistical analysis in the Bayesian paradigm, the most critical decision made by the researcher is the identification of a prior distribution for a parameter. Despite the mathematical soundness of the Bayesian approach, a wrongly specified prior can lead to biased and incorrect results. To avoid this, prior distributions should be based on real data, which are easily accessible in the "big data" era. This dissertation explores two applications of Bayesian hierarchical modelling that incorporate information obtained from a meta-analysis.

The first of these applications is in the normalization of genomics data, specifically for nanostring nCounter datasets. A meta-analysis …


Contributions To Causal Inference In Observational Studies, Jenny Park, Daniel F. Heitjan, Christy Boling Turer May 2023

Contributions To Causal Inference In Observational Studies, Jenny Park, Daniel F. Heitjan, Christy Boling Turer

Statistical Science Theses and Dissertations

The electronic health record (EHR) is a digital version of the patient chart. All clinically relevant patient information can be accessed from the EHR by professionals involved in the patient’s care. For researchers, the EHR is a rich, convenient source for data to address a vast range of medical research questions.

In observational studies with EHR data, it is common to define the treatment/exposure status as a binary indicator reflecting whether patient was documented to receive a particular medication or procedure. The outcome can be any type of information on patient status documented in the EHR after the treatment has …