Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 670

Full-Text Articles in Physical Sciences and Mathematics

Assessing Extant Methods For Generating G-Optimal Designs And A Novel Methodology To Compute The G-Score Of A Candidate Design, Hyrum John Hansen May 2024

Assessing Extant Methods For Generating G-Optimal Designs And A Novel Methodology To Compute The G-Score Of A Candidate Design, Hyrum John Hansen

All Graduate Theses and Dissertations, Fall 2023 to Present

Experimental designs are used by scientists to allocate treatments such that statistical inference is appropriate. Most traditional experimental designs have mathematical properties that make them desirable under certain conditions. Optimal experimental designs are those where the researcher can exercise total control over the treatment levels to maximize a chosen mathematical property. As is common in literature, the experimental design is represented as a matrix where each column represents a variable, and each row represents a trial. We define a function that takes as input the design matrix and outputs its score. We then algorithmically adjust each entry until a design …


"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson Apr 2024

"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson

Senior Honors Theses

The authorship of Hebrews has been a point of contention for scholars for the past two millennia. While the epistle is traditionally attributed to Paul, many scholars assert that it carries thematic, structural, and stylistic differences from the remainder of his extant epistles; therefore, many other possible authors have been proposed. Of these, only Luke has other New Testament writings. Therefore, this project conducts a statistical comparison of Hebrews to the Pauline and Lukan corpora using stylometric authorial analysis methods. This analysis demonstrates that Hebrews is stylistically closer to Lukan literature than Pauline (but not to a significant degree), and …


A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson Apr 2024

A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson

Honors College Theses

Over the previous 20 years, the software development industry has overseen an evolution in application of Version Control Systems (VCS) from a Centralized Version Control System (CVCS) format to a Decentralized Version Control Format (DVCS). Examples of the former include Perforce and Subversion whilst the latter of the two include Github and BitBucket. As DVCS models allow software contributors to maintain their respective local repositories of relevant code bases, developers are able to work offline and maintain their work with relative fault tolerance. This contrasts to CVCS models, which require software contributors to be connected online to a main server. …


Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms Mar 2024

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms

Rural Health Clinics

Researchers at the Maine Rural Health Research Center describe a methodology for identifying Rural Health Clinic encounters within the Medicaid claims data using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files.

Background: There is limited information on the extent to which Rural Health Clinics (RHC) provide pediatric and pregnancy-related services to individuals enrolled in state Medicaid/CHIP programs. In part this is because methods to identify RHC encounters within Medicaid claims data are outdated.

Methods: We used a 100% sample of the 2018 Medicaid Demographic and Eligibility and Other Services Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files for 20 states …


Hockey Card Statistics Are Stagnant And Stale, Egan J. Chernoff Jan 2024

Hockey Card Statistics Are Stagnant And Stale, Egan J. Chernoff

Journal of Humanistic Mathematics

The purchase of a coffee at a Canadian institution, Tim Hortons, turned into an informal investigation into hockey card statistics. Turns out, hockey card statistics are stagnant and stale. This was disappointing to see because the game of hockey has changed, the statistics used to keep track of the game have changed. Even the cards have changed. Well, not the back of the cards, which do not well enough paint a statistical picture of the hockey player photographed on the front of the card.


The Limits Of Data Science, David E. Drew Jan 2024

The Limits Of Data Science, David E. Drew

Journal of Humanistic Mathematics

Data science can contribute valuable predictions in diverse fields. But I write to express some concerns and red flags. I suggest that data science is being oversold. This article contains three questions that I believe data science must address as this new discipline matures. Is data science significantly different from statistics? This is a question that has haunted the field since the term first was introduced. By creating algorithms based on current societal decision rules that may be biased, even bigoted, does data science lock in and exacerbate inequality? Scholars have identified a continuum from data to information to knowledge …


Statistically Principled Deep Learning For Sar Image Segmentation, Cassandra Goldberg Jan 2024

Statistically Principled Deep Learning For Sar Image Segmentation, Cassandra Goldberg

Honors Projects

This project explores novel approaches for Synthetic Aperture Radar (SAR) image segmentation that integrate established statistical properties of SAR into deep learning models. First, Perlin Noise and Generalized Gamma distribution sampling methods were utilized to generate a synthetic dataset that effectively captures the statistical attributes of SAR data. Subsequently, deep learning segmentation architectures were developed that utilize average pooling and 1x1 convolutions to perform statistical moment computations. Finally, supervised and unsupervised disparity-based losses were incorporated into model training. The experimental outcomes yielded promising results: the synthetic dataset effectively trained deep learning models for real SAR data segmentation, the statistically-informed architectures …


Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles Jan 2024

Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles

Honors Theses and Capstones

With the analytics revolution in sports in the past 20 years, it seems that everything that can be quantified is. In basketball though, trying to break the game down into a set of numbers comes with a unique problem. While we've come up with a good set of advanced numbers to measure offensive efficiency, defense is fundamentally harder to quantify. The game is played five on five, but it has often been popular or convenient to model defense as a set of five one on one games. As defenses became more complex into the 2010s, this methodology became more insignificant. …


Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety Dec 2023

A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety

SMU Data Science Review

Microsoft collects an immense amount of data from the users of their product-self-help documentation. Employees use this data to identify these self-help articles' performance trends and measure their impact on business Key Performance Indicators (KPIs). Microsoft uses various tools like Power BI and Python to analyze this data. The problem is that their analysis and findings are summarized manually. Therefore, this research will improve upon their current analysis methods by applying the latest prompt engineering practices and the power of ChatGPT's large language models (LLMs). Using VBA code, Microsoft Excel, and the ChatGPT API as an Excel add-in, this research …


Gastropod Evolutionary Phylogeny, Priscilla Doran, Neal A. Doran Dec 2023

Gastropod Evolutionary Phylogeny, Priscilla Doran, Neal A. Doran

Proceedings of the International Conference on Creationism

This research seeks to investigate a correlation between the first appearance order date (FAD) and predicted evolutionary phylogeny of gastropods. Using a Spearman Correlation, 17 data sets of gastropods were analyzed, with a no significant correlation found between the first appearance date and predicted evolutionary date for the fossils.


Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda Nov 2023

Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda

LSU Master's Theses

This research examines tropical cyclone translation speed as a factor in storm tide and surge height upon landfall on the United States Gulf Coast. Understanding the effect of translation speed on peak storm tide/surge height is needed to better prepare for and predict future damage from tropical cyclone events. Tropical cyclone data are taken from hourly interpolated best-track HURDAT2 data from 1970–2021. This study uses the HURDAT2 hourly interpolated observation data points (24-hours) pre-landfall to landfall. Translation speed is calculated based on the distance traversed between hourly points. Peak storm tide and storm surge data are taken from SURGEDAT from …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz Sep 2023

Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz

Beyond: Undergraduate Research Journal

When it comes to registering to vote, Hispanic voters can only register as “Hispanic” in the “Race/Ethnicity” category, causing difficulties when analyzing voting trends amongst the Hispanic community. Upon the recent idea that not all Hispanic Groups vote the same, the goal is to create a model that can possibly identify a voter’s Hispanic Group with the information provided on the public Florida voter file. This is accomplished using name and zip code data for all voters in Palm Beach, Florida. This paper will explore the model implemented, its findings and limitations. Palm Beach, Florida, is met with low confidence …


Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler Aug 2023

Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler

SMU Data Science Review

In recent years, various new Machine Learning and Deep Learning algorithms have been introduced, claiming to offer better performance than traditional statistical approaches when forecasting time series. Studies seeking evidence to support the usage of ML/DL over statistical approaches have been limited to comparing the forecasting performance of univariate, linear time series data. This research compares the performance of traditional statistical-based and ML/DL methods for forecasting multivariate and nonlinear time series.


Statistical Precision Of A Replicated Farm Grazing Trial Versus Replicated Paddock Trials, K. P. Vogel, L. E. Moser, D. E. Bauer Aug 2023

Statistical Precision Of A Replicated Farm Grazing Trial Versus Replicated Paddock Trials, K. P. Vogel, L. E. Moser, D. E. Bauer

IGC Proceedings (1997-2023)

The experimental unit for animal average daily gain (ADG) and gain/ha in grazing trials is the paddock. Grazing trials on research stations often are conducted using small paddocks because animal and land costs restrict the number of treatments, replicates, and animals per paddock. Land and animal restrictions can be reduced by conducting trials on farms using animals provided by cooperating farmers. Farmers typically want only a single replicate on their farms and as result, virtually all on-farm trials in the USA and elsewhere have been un-replicated demonstration trials from which estimates of experimental error cannot be obtained. Farms can be …


Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove Jul 2023

Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove

Mathematics Summer Fellows

This study examines the change in connotative language use before and during the Covid-19 pandemic. By analyzing news articles from several major US newspapers, we found that there is a statistically significant correlation between the sentiment of the text and the publication period. Specifically, we document a large, systematic, and statistically significant decline in the overall sentiment of articles published in major news outlets. While our results do not directly gauge the sentiment of the population, our findings have important implications regarding the social responsibility of journalists and media outlets especially in times of crisis.


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu Jul 2023

Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu

Statistical Science Theses and Dissertations

In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.

There are different kinds of life-testing experiments that can be applied for different purposes. …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth May 2023

Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth

Statistical Science Theses and Dissertations

When conducting statistical analysis in the Bayesian paradigm, the most critical decision made by the researcher is the identification of a prior distribution for a parameter. Despite the mathematical soundness of the Bayesian approach, a wrongly specified prior can lead to biased and incorrect results. To avoid this, prior distributions should be based on real data, which are easily accessible in the "big data" era. This dissertation explores two applications of Bayesian hierarchical modelling that incorporate information obtained from a meta-analysis.

The first of these applications is in the normalization of genomics data, specifically for nanostring nCounter datasets. A meta-analysis …


Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu May 2023

Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu

Statistical Science Theses and Dissertations

In system reliability, practitioners may be interested in testing the homogeneity of the component lifetime distributions based on system lifetimes from multiple data sources for various reasons, such as identifying the component supplier that provides the most reliable components.

In the first part of the dissertation, we develop distribution-free hypothesis testing procedures for the homogeneity of the component lifetime distributions based on system lifetime data when the system structures are known. Several nonparametric testing statistics based on the empirical likelihood method are proposed for testing the homogeneity of two or more component lifetime distributions. The computational approaches to obtain the …


Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke May 2023

Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke

Mathematics, Statistics, and Computer Science Honors Projects

Gentrification is a complex process of urban redevelopment that typically involves an in-migration of educated people to neighborhoods experiencing a period of disinvestment. While gentrification is widely regarded for its potential to displace long-time businesses and residents of the neighborhood, its impact on crime is highly controversial. There is not a consensus on the relationship between gentrification and crime across criminological theory and past statistical studies have also shown contradictory results. Measuring gentrification on the tract level with census data, we seek to understand gentrification’s relationship with violent crime and theft in the Twin Cities. Using a Poisson model with …


Using A Distributive Approach To Model Insurance Loss, Kayla Kippes Apr 2023

Using A Distributive Approach To Model Insurance Loss, Kayla Kippes

Student Research Submissions

Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …


The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall Apr 2023

The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall

Honors Theses

One can see that this matrix is unitary and has eigenvalues {1,−i,−1, I}, each of infinite multiplicity. Throughout the remainder of this thesis, we will convince the reader that the above linear transformation is actually the Fourier transform. We will compute the commutant, as well as its invariant subspaces. The key to do this relies on the Hermite polynomials. Why do we recast the Fourier transform from its well-known and well studied integral form to the matrix form shown above? As we will see, the matrix form allows us to efficiently discover the operator theory of the Fourier transform obfuscated …


Length Bias Estimation Of Small Businesses Lifetime, Simeng Li Apr 2023

Length Bias Estimation Of Small Businesses Lifetime, Simeng Li

Honors Theses

Small businesses, particularly restaurants, play a crucial role in the economy by generating employment opportunities, boosting tourism, and contributing to the local economy. However, accurately estimating their lifetimes can be challenging due to the presence of length bias, which occurs when the likelihood of sampling any particular restaurant's closure is influenced by its duration in operation. To address the issue, this study conducts goodness-of-fit tests on exponential/gamma family distributions and employs the Kaplan-Meier method to more accurately estimate the average lifetime of restaurants in Carytown. By providing insights into the challenges of estimating the lifetimes of small businesses, this study …


Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez Apr 2023

Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez

Statistical Science Theses and Dissertations

Influence diagnostics in regression analysis allow analysts to identify observations that have a strong influence on model fitted probabilities and parameter estimates. The most common influence diagnostics, such as Cook’s Distance for linear regression, are based on a deletion approach where the results of a model with and without observations of interest are compared. Here, deletion-based influence diagnostics are proposed for generalized estimating equations (GEE) for correlated, or clustered, nominal multinomial responses. The proposed influence diagnostics focus on GEEs with the baseline-category logit link function and a local odds ratio parameterization of the association structure. Formulas for both observation- and …


K-8 Preservice Teachers’ Statistical Thinking When Determining Best Measure Of Center, Ha Nguyen, Eryn M. Stehr Maher, Gregory Chamblee, Sharon Taylor Apr 2023

K-8 Preservice Teachers’ Statistical Thinking When Determining Best Measure Of Center, Ha Nguyen, Eryn M. Stehr Maher, Gregory Chamblee, Sharon Taylor

Department of Mathematical Sciences Faculty Publications

The purpose of this study was to determine K-8 preservice teacher (PST) candidates’ statistical thinking when selecting the best center representation for the given data. Forty-four PSTs enrolled in a Statistics and Probability for K-8 Teachers course in a university located in the southeastern region of the United States were asked to complete a 2007 National Assessment of Educational Progress test item. All 44 PSTs’ data were qualitatively analyzed for correctness and statistical thinking strategies used. Findings were that most PSTs either incorrectly selected the mean, rather than median, as the best measure of center for the given data or …


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …