Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Series

Statistics

Institution
Publication Year
Publication
File Type

Articles 1 - 30 of 123

Full-Text Articles in Physical Sciences and Mathematics

"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson Apr 2024

"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson

Senior Honors Theses

The authorship of Hebrews has been a point of contention for scholars for the past two millennia. While the epistle is traditionally attributed to Paul, many scholars assert that it carries thematic, structural, and stylistic differences from the remainder of his extant epistles; therefore, many other possible authors have been proposed. Of these, only Luke has other New Testament writings. Therefore, this project conducts a statistical comparison of Hebrews to the Pauline and Lukan corpora using stylometric authorial analysis methods. This analysis demonstrates that Hebrews is stylistically closer to Lukan literature than Pauline (but not to a significant degree), and …


Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms Mar 2024

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms

Rural Health Clinics

Researchers at the Maine Rural Health Research Center describe a methodology for identifying Rural Health Clinic encounters within the Medicaid claims data using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files.

Background: There is limited information on the extent to which Rural Health Clinics (RHC) provide pediatric and pregnancy-related services to individuals enrolled in state Medicaid/CHIP programs. In part this is because methods to identify RHC encounters within Medicaid claims data are outdated.

Methods: We used a 100% sample of the 2018 Medicaid Demographic and Eligibility and Other Services Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files for 20 states …


Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove Jul 2023

Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove

Mathematics Summer Fellows

This study examines the change in connotative language use before and during the Covid-19 pandemic. By analyzing news articles from several major US newspapers, we found that there is a statistically significant correlation between the sentiment of the text and the publication period. Specifically, we document a large, systematic, and statistically significant decline in the overall sentiment of articles published in major news outlets. While our results do not directly gauge the sentiment of the population, our findings have important implications regarding the social responsibility of journalists and media outlets especially in times of crisis.


Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke May 2023

Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke

Mathematics, Statistics, and Computer Science Honors Projects

Gentrification is a complex process of urban redevelopment that typically involves an in-migration of educated people to neighborhoods experiencing a period of disinvestment. While gentrification is widely regarded for its potential to displace long-time businesses and residents of the neighborhood, its impact on crime is highly controversial. There is not a consensus on the relationship between gentrification and crime across criminological theory and past statistical studies have also shown contradictory results. Measuring gentrification on the tract level with census data, we seek to understand gentrification’s relationship with violent crime and theft in the Twin Cities. Using a Poisson model with …


Wright State University Fact Sheet, 2022-2023, Office Of Institutional Research & Effectiveness, Wright State University Jan 2023

Wright State University Fact Sheet, 2022-2023, Office Of Institutional Research & Effectiveness, Wright State University

Wright State University Fact Sheets

The Wright State University Fact Sheet showcasing numbers and statistics for Wright State University including demographics, funding, programs, and employment for the 2022-2023 academic year.


On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis Apr 2022

On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis

Honors Scholar Theses

The Kolmogorov–Smirnov (KS) test is one of the most popular goodness-of-fit tests for comparing a sample with a hypothesized parametric distribution. Nevertheless, it has often been misused. The standard one-sample KS test applies to independent, continuous data with a hypothesized distribution that is completely specified. It is not uncommon, however, to see in the literature that it was applied to dependent, discrete, or rounded data, with hypothesized distributions containing estimated parameters. For example, it has been "discovered" multiple times that the test is too conservative when the parameters are estimated. We demonstrate misuses of the one-sample KS test in three …


A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose Apr 2022

A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose

Masters Theses & Specialist Projects

Department of Psychological Sciences Western Kentucky University There are two options to estimate a range of likely values for the population mean of a continuous variable: one for when the population standard deviation is known and another for when the population standard deviation is unknown. There are seven proposed equations to calculate the confidence interval for the population mean of a dichotomous variable: normal approximation interval, Wilson interval, Jeffreys interval, Clopper-Pearson, Agresti-Coull, arcsine transformation, and logit transformation. In this study, I compared the percent effectiveness of each equation using a Monte Carlo analysis and the interval range over a range …


Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Trade Bait: Season 3, Ben Bagley Oct 2021

Trade Bait: Season 3, Ben Bagley

WWU Honors College Senior Projects

A 5-episode podcast series dissecting the use of statistics in the NFL and NFL Media


Lab Exercises For Statistics Using Excel, Julia Nebia, Steven Cosares, Milena Cuellar Jul 2021

Lab Exercises For Statistics Using Excel, Julia Nebia, Steven Cosares, Milena Cuellar

Open Educational Resources

This document contains the text associated with a series of computer-based lab exercises to help students apply the concepts usually included in a first course in Statistics. A compressed file has been included that contains a separate folder for each lab. In each folder is an excel spreadsheet file and an editable word document providing the instructions for students to complete the exercise. The exercises are not numbered in the folders, so you can select any subset of these exercises to assign to your students. You are free to modify the instructions in any way you see fit, e.g., to …


A Review Of Logistic Regression And Its Application, Sultana Mubarika Rahman Chowdhury Jun 2021

A Review Of Logistic Regression And Its Application, Sultana Mubarika Rahman Chowdhury

FIU Electronic Theses and Dissertations

The purpose of this thesis is to do an in-depth review of logistic regression and its application. Additionally, comparison of four different methods of coefficient standardization was done using Heart Disease Dataset. These methods were compared based on testing accuracy, training accuracy, area under the curve, sensitivity, and specificity. Furthermore, logistic regression analysis was applied to National Longitudinal Study of Adolescence Health Survey (Add health) dataset to examine the relationship between anxiety or panic disorder and history of childhood maltreatment, medical conditions such as ADHD, PTSD, some socio-economic conditions and addiction. Results indicated; history of abuse has a significant effect …


Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun May 2021

Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun

Mathematical Sciences Technical Reports (MSTR)

Data can be lost for different reasons, but sometimes the missingness is a part of the data collection process. Unbiased and efficient estimation of the parameters governing the response mean model requires the missing data to be appropriately addressed. This paper compares and contrasts the Maximum Likelihood and Inverse Probability Weighting estimators in an Outcome-Dependendent Sampling design that deliberately generates incomplete observations. WE demonstrate the comparison through numerical simulations under varied conditions: different coefficient of determination, and whether or not the mean model is misspecified.


We’Re Here To Get You There: A Statistical Analysis Of Bridgewater State University’S Transit System, Abigail Adams May 2021

We’Re Here To Get You There: A Statistical Analysis Of Bridgewater State University’S Transit System, Abigail Adams

Honors Program Theses and Projects

Bridgewater State University first established its on-campus transportation service in January of 1984. While it began only running as an on-campus service for students throughout the day, the service grew to expand by offering an off-campus connection to the neighboring city of Brockton and absorbed the night service system from the campus safety team. As BSU Transit continues to grow, the organization is seeking ways to improve their overall service and better prepare their fleet and driver pool to accommodate this growth. The purpose of this research is to analyze trends among the data collected by BSU Transit and assist …


A Study On Differing Generational Values And Expectations In Corporate America, Abigail Grella May 2021

A Study On Differing Generational Values And Expectations In Corporate America, Abigail Grella

Honors Program Theses and Projects

This paper examines the most common factors that lead to voluntary employee turnover, and the implications employee turnover has on an organization. Additionally, this paper will consider the varying values and workplace expectations of different demographic groups such as Millennials, Generation X, Generation Y, and Baby Boomers and how such factors could influence voluntary turnover. A study is conducted from survey results gathered across a large span of generations that are currently employed. Using statistical analysis employing t-tests and a Mood’s Median test, the results show that different generations have differently weighing values for specific organizational offerings. The results show …


Guidelines For Regression Analysis In Sas And R: A Case Study, Sarah Milligan May 2021

Guidelines For Regression Analysis In Sas And R: A Case Study, Sarah Milligan

Honors Program Theses and Projects

When a player is a free agent, an individual who is able to sign to any team, one wonders what their best option is. Will signing with Team A or Team B provide them with the largest salary? What factors will affect their salary the most? Does last year’s statistics have a strong impact on next year’s salary? These questions can be answered by performing a regression analysis on previous years data. The primary focus of this project is to determine the most important variables related to an NBA salary. Likewise, the statistical programs SAS and R will be compared …


Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach Jan 2021

Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach

All Faculty Scholarship

Event studies, a half-century-old approach to measuring the effect of events on stock prices, are now ubiquitous in securities fraud litigation. In determining whether the event study demonstrates a price effect, expert witnesses typically base their conclusion on whether the results are statistically significant at the 95% confidence level, a threshold that is drawn from the academic literature. As a positive matter, this represents a disconnect with legal standards of proof. As a normative matter, it may reduce enforcement of fraud claims because litigation event studies typically involve quite low statistical power even for large-scale frauds.

This paper, written for …


Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec Dec 2020

Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec

Student Research

The release of Statcast data in 2015 was revolutionary for data analysis in the game of baseball. Many analysts have begun using this data regularly, but none have used it exclusively. Often older, less reliable statistics (on-base percentage) are still used in favor of the newer statistics (weighted runs created plus). In this paper, we attempt to explain the variation in weighted runs created plus (wRC+) using Statcast variables such as exit velocity and launch angle. We find that exit velocity along with other Statcast variables, can explain as much as 70% of the variation in wRC+. Launch angle can …


“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin Aug 2020

“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin

Statistical and Data Sciences: Faculty Publications

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to “play the whole game” of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question “How do I spend my time?” On the one hand, the exercise involves answering a question with near universal appeal, but …


Analyzing Competitive Balance In Professional Sport, Kevin Alwell May 2020

Analyzing Competitive Balance In Professional Sport, Kevin Alwell

Honors Scholar Theses

In this paper we review several measures to statistically analyze competitive balance and report which leagues have a wider variance of performance amongst its competitors. Each league seeks to maintain high levels of parity, making matches and overall season more unpredictable and appealing to the general audience. Here we quantify competitive advantage across major sports leagues in numbers using several statistical methods in order for leagues to optimize their revenue.


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte May 2020

The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte

Student Scholar Symposium Abstracts and Posters

Osteoporosis is a disease that causes the degradation of bone, leading to an increased risk of fracture. 1 in 3 women over the age of 50 will be affected by Osteoporosis. This study aims to understand how bone is affected by sleep deprivation in estrogen-deficient rats, and how Zoledronate might negate the inimical effects of sleep deprivation on bone. As bone mineral density (BMD) is a crude evaluation of the architectural changes seen in Osteoporosis, trabecular thickness may serve as a better single evaluation of bone health. 31 Wistar female rats were ovariectomized and separated into 4 random groups. The …


Dice Questions Answered, Warren Campbell, William P. Dolan Apr 2020

Dice Questions Answered, Warren Campbell, William P. Dolan

SEAS Faculty Publications

Superstitious discussion of fair and unfair dice has pervaded the tabletop gaming industry since its inception. Many of these are not based on any quantitative data or studies. Consequently, misconceptions have been spread widely. One dice float test video on Youtube currently has 925,000 views (Fisher, 2015a). To combat the flood of misconceptions we investigated the following questions: 1) Are dice cursed? 2) Are D20s (20-sided dice) less fair than D6s (6-sided dice)? 3) Do float tests tell anything about the fairness of dice? 4) Are some dice systems inherently fairer than others? 5) Are density differences or dimensions more …


Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou Apr 2020

Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou

Honors Projects

This paper investigates the quality of contact that a pitcher allows. Not much is currently known about quality of contact, but if factors determining quality of contact could be determined it could assist teams in identifying and developing pitching talent. There are many problems that come with investigating the control pitchers have over contact allowed, but one area to investigate is whether quality of contact is a repeatable skill. Furthermore, if it is a repeatable skill, then it is important to investigate what kind of benefit controlling contact allowed brings a pitcher. Along with this, groundball and flyball tendencies, and …


Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.


Representing And Interpreting Data From Playfair, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Representing And Interpreting Data From Playfair, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.


The Role Of Topography, Soil, And Remotely Sensed Vegetation Condition Towards Predicting Crop Yield, Trenton E. Franz, Sayli Pokal, Justin P. Gibson, Yuzhen Zhou, Hamed Gholizadeh, Fatima Amor Tenorio, Daran Rudnick, Derek M. Heeren, Matthew F. Mccabe, Matteo Ziliani, Zhenong Jin, Kaiyu Guan, Ming Pan, John Gates, Brian Wardlow Jan 2020

The Role Of Topography, Soil, And Remotely Sensed Vegetation Condition Towards Predicting Crop Yield, Trenton E. Franz, Sayli Pokal, Justin P. Gibson, Yuzhen Zhou, Hamed Gholizadeh, Fatima Amor Tenorio, Daran Rudnick, Derek M. Heeren, Matthew F. Mccabe, Matteo Ziliani, Zhenong Jin, Kaiyu Guan, Ming Pan, John Gates, Brian Wardlow

School of Natural Resources: Faculty Publications

Foreknowledge of the spatiotemporal drivers of crop yield would provide a valuable source of information to optimize on-farm inputs and maximize profitability. In recent years, an abundance of spatial data providing information on soils, topography, and vegetation condition have become available from both proximal and remote sensing platforms. Given the wide range of data costs (between USD $0−50/ha), it is important to understand where often limited financial resources should be directed to optimize field production. Two key questions arise. First, will these data actually aid in better fine-resolution yield prediction to help optimize crop management and farm economics? Second, what …


9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association Sep 2019

9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association

Annual Postdoctoral Science Symposium Abstracts

The mission of the Annual Postdoctoral Science Symposium (APSS) is to provide a platform for talented postdoctoral fellows throughout the Texas Medical Center to present their work to a wider audience. The MD Anderson Postdoctoral Association convened its inaugural Annual Postdoctoral Science Symposium (APSS) on August 4, 2011.

The APSS provides a professional venue for postdoctoral scientists to develop, clarify, and refine their research as a result of formal reviews and critiques of faculty and other postdoctoral scientists. Additionally, attendees discuss current research on a broad range of subjects while promoting academic interactions and enrichment and developing new collaborations.


Who Can Act? Critical Assumptions At The Foundations Of Statistical Analysis, Peter J. Taylor Aug 2019

Who Can Act? Critical Assumptions At The Foundations Of Statistical Analysis, Peter J. Taylor

Working Papers on Science in a Changing World

Thinking about a simple teaching example on the t-test for comparing the average (mean) for some measurement in a group versus the average in another led me to articulate a sequence of thoughts and questions about the foundations of statistical analysis. In particular, my inquiry explores contrasts between: the statistical emphasis on averages or types around which there is variation or noise; variation as a mixture of types; the dynamics (or heterogeneous mix of dynamics) that generated the data analyzed; and participatory restructuring of these dynamics in the future. Two key issues are: Who is assumed to be able to …


The Evolution Of Data Science: A New Mode Of Knowledge Production, Jennifer Lewis Priestley, Robert J. Mcgrath Apr 2019

The Evolution Of Data Science: A New Mode Of Knowledge Production, Jennifer Lewis Priestley, Robert J. Mcgrath

Faculty and Research Publications

Is data science a new field of study or simply an extension or specialization of a discipline that already exists, such as statistics, computer science, or mathematics? This article explores the evolution of data science as a potentially new academic discipline, which has evolved as a function of new problem sets that established disciplines have been ill-prepared to address. The authors find that this newly-evolved discipline can be viewed through the lens of a new mode of knowledge production and is characterized by transdisciplinarity collaboration with the private sector and increased accountability. Lessons from this evolution can inform knowledge production …


Sensitivity Analyses For Tumor Growth Models, Ruchini Dilinika Mendis Apr 2019

Sensitivity Analyses For Tumor Growth Models, Ruchini Dilinika Mendis

Masters Theses & Specialist Projects

This study consists of the sensitivity analysis for two previously developed tumor growth models: Gompertz model and quotient model. The two models are considered in both continuous and discrete time. In continuous time, model parameters are estimated using least-square method, while in discrete time, the partial-sum method is used. Moreover, frequentist and Bayesian methods are used to construct confidence intervals and credible intervals for the model parameters. We apply the Markov Chain Monte Carlo (MCMC) techniques with the Random Walk Metropolis algorithm with Non-informative Prior and the Delayed Rejection Adoptive Metropolis (DRAM) algorithm to construct parameters' posterior distributions and then …