Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 660

Full-Text Articles in Physical Sciences and Mathematics

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms Mar 2024

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms

Rural Health Clinics

Researchers at the Maine Rural Health Research Center describe a methodology for identifying Rural Health Clinic encounters within the Medicaid claims data using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files.

Background: There is limited information on the extent to which Rural Health Clinics (RHC) provide pediatric and pregnancy-related services to individuals enrolled in state Medicaid/CHIP programs. In part this is because methods to identify RHC encounters within Medicaid claims data are outdated.

Methods: We used a 100% sample of the 2018 Medicaid Demographic and Eligibility and Other Services Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files for 20 states …


Hockey Card Statistics Are Stagnant And Stale, Egan J. Chernoff Jan 2024

Hockey Card Statistics Are Stagnant And Stale, Egan J. Chernoff

Journal of Humanistic Mathematics

The purchase of a coffee at a Canadian institution, Tim Hortons, turned into an informal investigation into hockey card statistics. Turns out, hockey card statistics are stagnant and stale. This was disappointing to see because the game of hockey has changed, the statistics used to keep track of the game have changed. Even the cards have changed. Well, not the back of the cards, which do not well enough paint a statistical picture of the hockey player photographed on the front of the card.


The Limits Of Data Science, David E. Drew Jan 2024

The Limits Of Data Science, David E. Drew

Journal of Humanistic Mathematics

Data science can contribute valuable predictions in diverse fields. But I write to express some concerns and red flags. I suggest that data science is being oversold. This article contains three questions that I believe data science must address as this new discipline matures. Is data science significantly different from statistics? This is a question that has haunted the field since the term first was introduced. By creating algorithms based on current societal decision rules that may be biased, even bigoted, does data science lock in and exacerbate inequality? Scholars have identified a continuum from data to information to knowledge …


Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety Dec 2023

A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety

SMU Data Science Review

Microsoft collects an immense amount of data from the users of their product-self-help documentation. Employees use this data to identify these self-help articles' performance trends and measure their impact on business Key Performance Indicators (KPIs). Microsoft uses various tools like Power BI and Python to analyze this data. The problem is that their analysis and findings are summarized manually. Therefore, this research will improve upon their current analysis methods by applying the latest prompt engineering practices and the power of ChatGPT's large language models (LLMs). Using VBA code, Microsoft Excel, and the ChatGPT API as an Excel add-in, this research …


Gastropod Evolutionary Phylogeny, Priscilla Doran, Neal A. Doran Dec 2023

Gastropod Evolutionary Phylogeny, Priscilla Doran, Neal A. Doran

Proceedings of the International Conference on Creationism

This research seeks to investigate a correlation between the first appearance order date (FAD) and predicted evolutionary phylogeny of gastropods. Using a Spearman Correlation, 17 data sets of gastropods were analyzed, with a no significant correlation found between the first appearance date and predicted evolutionary date for the fossils.


Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda Nov 2023

Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda

LSU Master's Theses

This research examines tropical cyclone translation speed as a factor in storm tide and surge height upon landfall on the United States Gulf Coast. Understanding the effect of translation speed on peak storm tide/surge height is needed to better prepare for and predict future damage from tropical cyclone events. Tropical cyclone data are taken from hourly interpolated best-track HURDAT2 data from 1970–2021. This study uses the HURDAT2 hourly interpolated observation data points (24-hours) pre-landfall to landfall. Translation speed is calculated based on the distance traversed between hourly points. Peak storm tide and storm surge data are taken from SURGEDAT from …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz Sep 2023

Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz

Beyond: Undergraduate Research Journal

When it comes to registering to vote, Hispanic voters can only register as “Hispanic” in the “Race/Ethnicity” category, causing difficulties when analyzing voting trends amongst the Hispanic community. Upon the recent idea that not all Hispanic Groups vote the same, the goal is to create a model that can possibly identify a voter’s Hispanic Group with the information provided on the public Florida voter file. This is accomplished using name and zip code data for all voters in Palm Beach, Florida. This paper will explore the model implemented, its findings and limitations. Palm Beach, Florida, is met with low confidence …


Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler Aug 2023

Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler

SMU Data Science Review

In recent years, various new Machine Learning and Deep Learning algorithms have been introduced, claiming to offer better performance than traditional statistical approaches when forecasting time series. Studies seeking evidence to support the usage of ML/DL over statistical approaches have been limited to comparing the forecasting performance of univariate, linear time series data. This research compares the performance of traditional statistical-based and ML/DL methods for forecasting multivariate and nonlinear time series.


Statistical Precision Of A Replicated Farm Grazing Trial Versus Replicated Paddock Trials, K. P. Vogel, L. E. Moser, D. E. Bauer Aug 2023

Statistical Precision Of A Replicated Farm Grazing Trial Versus Replicated Paddock Trials, K. P. Vogel, L. E. Moser, D. E. Bauer

IGC Proceedings (1997-2023)

The experimental unit for animal average daily gain (ADG) and gain/ha in grazing trials is the paddock. Grazing trials on research stations often are conducted using small paddocks because animal and land costs restrict the number of treatments, replicates, and animals per paddock. Land and animal restrictions can be reduced by conducting trials on farms using animals provided by cooperating farmers. Farmers typically want only a single replicate on their farms and as result, virtually all on-farm trials in the USA and elsewhere have been un-replicated demonstration trials from which estimates of experimental error cannot be obtained. Farms can be …


Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove Jul 2023

Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove

Mathematics Summer Fellows

This study examines the change in connotative language use before and during the Covid-19 pandemic. By analyzing news articles from several major US newspapers, we found that there is a statistically significant correlation between the sentiment of the text and the publication period. Specifically, we document a large, systematic, and statistically significant decline in the overall sentiment of articles published in major news outlets. While our results do not directly gauge the sentiment of the population, our findings have important implications regarding the social responsibility of journalists and media outlets especially in times of crisis.


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu Jul 2023

Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu

Statistical Science Theses and Dissertations

In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.

There are different kinds of life-testing experiments that can be applied for different purposes. …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth May 2023

Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth

Statistical Science Theses and Dissertations

When conducting statistical analysis in the Bayesian paradigm, the most critical decision made by the researcher is the identification of a prior distribution for a parameter. Despite the mathematical soundness of the Bayesian approach, a wrongly specified prior can lead to biased and incorrect results. To avoid this, prior distributions should be based on real data, which are easily accessible in the "big data" era. This dissertation explores two applications of Bayesian hierarchical modelling that incorporate information obtained from a meta-analysis.

The first of these applications is in the normalization of genomics data, specifically for nanostring nCounter datasets. A meta-analysis …


Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu May 2023

Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu

Statistical Science Theses and Dissertations

In system reliability, practitioners may be interested in testing the homogeneity of the component lifetime distributions based on system lifetimes from multiple data sources for various reasons, such as identifying the component supplier that provides the most reliable components.

In the first part of the dissertation, we develop distribution-free hypothesis testing procedures for the homogeneity of the component lifetime distributions based on system lifetime data when the system structures are known. Several nonparametric testing statistics based on the empirical likelihood method are proposed for testing the homogeneity of two or more component lifetime distributions. The computational approaches to obtain the …


Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke May 2023

Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke

Mathematics, Statistics, and Computer Science Honors Projects

Gentrification is a complex process of urban redevelopment that typically involves an in-migration of educated people to neighborhoods experiencing a period of disinvestment. While gentrification is widely regarded for its potential to displace long-time businesses and residents of the neighborhood, its impact on crime is highly controversial. There is not a consensus on the relationship between gentrification and crime across criminological theory and past statistical studies have also shown contradictory results. Measuring gentrification on the tract level with census data, we seek to understand gentrification’s relationship with violent crime and theft in the Twin Cities. Using a Poisson model with …


Using A Distributive Approach To Model Insurance Loss, Kayla Kippes Apr 2023

Using A Distributive Approach To Model Insurance Loss, Kayla Kippes

Student Research Submissions

Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …


Length Bias Estimation Of Small Businesses Lifetime, Simeng Li Apr 2023

Length Bias Estimation Of Small Businesses Lifetime, Simeng Li

Honors Theses

Small businesses, particularly restaurants, play a crucial role in the economy by generating employment opportunities, boosting tourism, and contributing to the local economy. However, accurately estimating their lifetimes can be challenging due to the presence of length bias, which occurs when the likelihood of sampling any particular restaurant's closure is influenced by its duration in operation. To address the issue, this study conducts goodness-of-fit tests on exponential/gamma family distributions and employs the Kaplan-Meier method to more accurately estimate the average lifetime of restaurants in Carytown. By providing insights into the challenges of estimating the lifetimes of small businesses, this study …


The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall Apr 2023

The Commutant Of The Fourier–Plancherel Transform, Brianna Cantrall

Honors Theses

One can see that this matrix is unitary and has eigenvalues {1,−i,−1, I}, each of infinite multiplicity. Throughout the remainder of this thesis, we will convince the reader that the above linear transformation is actually the Fourier transform. We will compute the commutant, as well as its invariant subspaces. The key to do this relies on the Hermite polynomials. Why do we recast the Fourier transform from its well-known and well studied integral form to the matrix form shown above? As we will see, the matrix form allows us to efficiently discover the operator theory of the Fourier transform obfuscated …


Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez Apr 2023

Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez

Statistical Science Theses and Dissertations

Influence diagnostics in regression analysis allow analysts to identify observations that have a strong influence on model fitted probabilities and parameter estimates. The most common influence diagnostics, such as Cook’s Distance for linear regression, are based on a deletion approach where the results of a model with and without observations of interest are compared. Here, deletion-based influence diagnostics are proposed for generalized estimating equations (GEE) for correlated, or clustered, nominal multinomial responses. The proposed influence diagnostics focus on GEEs with the baseline-category logit link function and a local odds ratio parameterization of the association structure. Formulas for both observation- and …


K-8 Preservice Teachers’ Statistical Thinking When Determining Best Measure Of Center, Ha Nguyen, Eryn M. Stehr Maher, Gregory Chamblee, Sharon Taylor Apr 2023

K-8 Preservice Teachers’ Statistical Thinking When Determining Best Measure Of Center, Ha Nguyen, Eryn M. Stehr Maher, Gregory Chamblee, Sharon Taylor

Department of Mathematical Sciences Faculty Publications

The purpose of this study was to determine K-8 preservice teacher (PST) candidates’ statistical thinking when selecting the best center representation for the given data. Forty-four PSTs enrolled in a Statistics and Probability for K-8 Teachers course in a university located in the southeastern region of the United States were asked to complete a 2007 National Assessment of Educational Progress test item. All 44 PSTs’ data were qualitatively analyzed for correctness and statistical thinking strategies used. Findings were that most PSTs either incorrectly selected the mean, rather than median, as the best measure of center for the given data or …


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …


Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley Apr 2023

Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley

Senior Theses

In Major League Baseball (MLB), the outcome of a stolen base attempt has important implications. Success moves the runner closer to scoring, while failure records an out and removes the runner from the basepaths altogether. Therefore, it is important that the decision by a coach or player to steal a base is well-informed. In this thesis, I explore a statistical approach to making this decision. I train logistic regression and random forest models, using data about the game situation and about the runner, pitcher, and catcher involved in the stolen base attempt, to estimate the probability that a stolen base …


Wright State University Fact Sheet, 2022-2023, Office Of Institutional Research & Effectiveness, Wright State University Jan 2023

Wright State University Fact Sheet, 2022-2023, Office Of Institutional Research & Effectiveness, Wright State University

Wright State University Fact Sheets

The Wright State University Fact Sheet showcasing numbers and statistics for Wright State University including demographics, funding, programs, and employment for the 2022-2023 academic year.


Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong Dec 2022

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou Dec 2022

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou

Machine Learning Faculty Publications

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a “best” detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled anomalous data for model evaluation and comparison, there is a lack of systematic approaches that are able to select the “best” model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. …


Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen Oct 2022

Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen

Theses

In this paper we explore Bayesian inference and its application to the problem of estimating the intensity function of a non-homogeneous Poisson process. These processes model the behavior of phenomena in which one or more events, known as arrivals, occur independently of one another over a certain period of time. We are concerned with the number of events occurring during particular time intervals across several realizations of the process. We show that given sufficient data, we are able to construct a piecewise-constant function which accurately estimates the mean rates on particular intervals. Further, we show that as we reduce these …