Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Series

Statistics

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 202

Full-Text Articles in Physical Sciences and Mathematics

"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson Apr 2024

"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson

Senior Honors Theses

The authorship of Hebrews has been a point of contention for scholars for the past two millennia. While the epistle is traditionally attributed to Paul, many scholars assert that it carries thematic, structural, and stylistic differences from the remainder of his extant epistles; therefore, many other possible authors have been proposed. Of these, only Luke has other New Testament writings. Therefore, this project conducts a statistical comparison of Hebrews to the Pauline and Lukan corpora using stylometric authorial analysis methods. This analysis demonstrates that Hebrews is stylistically closer to Lukan literature than Pauline (but not to a significant degree), and …


Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms Mar 2024

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms

Rural Health Clinics

Researchers at the Maine Rural Health Research Center describe a methodology for identifying Rural Health Clinic encounters within the Medicaid claims data using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files.

Background: There is limited information on the extent to which Rural Health Clinics (RHC) provide pediatric and pregnancy-related services to individuals enrolled in state Medicaid/CHIP programs. In part this is because methods to identify RHC encounters within Medicaid claims data are outdated.

Methods: We used a 100% sample of the 2018 Medicaid Demographic and Eligibility and Other Services Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files for 20 states …


Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke May 2023

Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke

Mathematics, Statistics, and Computer Science Honors Projects

Gentrification is a complex process of urban redevelopment that typically involves an in-migration of educated people to neighborhoods experiencing a period of disinvestment. While gentrification is widely regarded for its potential to displace long-time businesses and residents of the neighborhood, its impact on crime is highly controversial. There is not a consensus on the relationship between gentrification and crime across criminological theory and past statistical studies have also shown contradictory results. Measuring gentrification on the tract level with census data, we seek to understand gentrification’s relationship with violent crime and theft in the Twin Cities. Using a Poisson model with …


K-8 Preservice Teachers’ Statistical Thinking When Determining Best Measure Of Center, Ha Nguyen, Eryn M. Stehr Maher, Gregory Chamblee, Sharon Taylor Apr 2023

K-8 Preservice Teachers’ Statistical Thinking When Determining Best Measure Of Center, Ha Nguyen, Eryn M. Stehr Maher, Gregory Chamblee, Sharon Taylor

Department of Mathematical Sciences Faculty Publications

The purpose of this study was to determine K-8 preservice teacher (PST) candidates’ statistical thinking when selecting the best center representation for the given data. Forty-four PSTs enrolled in a Statistics and Probability for K-8 Teachers course in a university located in the southeastern region of the United States were asked to complete a 2007 National Assessment of Educational Progress test item. All 44 PSTs’ data were qualitatively analyzed for correctness and statistical thinking strategies used. Findings were that most PSTs either incorrectly selected the mean, rather than median, as the best measure of center for the given data or …


Wright State University Fact Sheet, 2022-2023, Office Of Institutional Research & Effectiveness, Wright State University Jan 2023

Wright State University Fact Sheet, 2022-2023, Office Of Institutional Research & Effectiveness, Wright State University

Wright State University Fact Sheets

The Wright State University Fact Sheet showcasing numbers and statistics for Wright State University including demographics, funding, programs, and employment for the 2022-2023 academic year.


Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou Dec 2022

Automs: Automatic Model Selection For Novelty Detection With Error Rate Control, Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, Dejing Dou

Machine Learning Faculty Publications

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a “best” detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled anomalous data for model evaluation and comparison, there is a lack of systematic approaches that are able to select the “best” model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. …


Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li Sep 2022

Distance Based Image Classification: A Solution To Generative Classification’S Conundrum?, Wen-Yan Lin, Siying Liu, Bing Tian Dai, Hongdong Li

Research Collection School Of Computing and Information Systems

Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory’s [25] hierarchical generative process and non-semantic factors by …


On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis Apr 2022

On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis

Honors Scholar Theses

The Kolmogorov–Smirnov (KS) test is one of the most popular goodness-of-fit tests for comparing a sample with a hypothesized parametric distribution. Nevertheless, it has often been misused. The standard one-sample KS test applies to independent, continuous data with a hypothesized distribution that is completely specified. It is not uncommon, however, to see in the literature that it was applied to dependent, discrete, or rounded data, with hypothesized distributions containing estimated parameters. For example, it has been "discovered" multiple times that the test is too conservative when the parameters are estimated. We demonstrate misuses of the one-sample KS test in three …


Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito Apr 2022

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito

Michigan Tech Publications

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this …


A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose Apr 2022

A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose

Masters Theses & Specialist Projects

Department of Psychological Sciences Western Kentucky University There are two options to estimate a range of likely values for the population mean of a continuous variable: one for when the population standard deviation is known and another for when the population standard deviation is unknown. There are seven proposed equations to calculate the confidence interval for the population mean of a dichotomous variable: normal approximation interval, Wilson interval, Jeffreys interval, Clopper-Pearson, Agresti-Coull, arcsine transformation, and logit transformation. In this study, I compared the percent effectiveness of each equation using a Monte Carlo analysis and the interval range over a range …


Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Trade Bait: Season 3, Ben Bagley Oct 2021

Trade Bait: Season 3, Ben Bagley

WWU Honors College Senior Projects

A 5-episode podcast series dissecting the use of statistics in the NFL and NFL Media


Lab Exercises For Statistics Using Excel, Julia Nebia, Steven Cosares, Milena Cuellar Jul 2021

Lab Exercises For Statistics Using Excel, Julia Nebia, Steven Cosares, Milena Cuellar

Open Educational Resources

This document contains the text associated with a series of computer-based lab exercises to help students apply the concepts usually included in a first course in Statistics. A compressed file has been included that contains a separate folder for each lab. In each folder is an excel spreadsheet file and an editable word document providing the instructions for students to complete the exercise. The exercises are not numbered in the folders, so you can select any subset of these exercises to assign to your students. You are free to modify the instructions in any way you see fit, e.g., to …


A Review Of Logistic Regression And Its Application, Sultana Mubarika Rahman Chowdhury Jun 2021

A Review Of Logistic Regression And Its Application, Sultana Mubarika Rahman Chowdhury

FIU Electronic Theses and Dissertations

The purpose of this thesis is to do an in-depth review of logistic regression and its application. Additionally, comparison of four different methods of coefficient standardization was done using Heart Disease Dataset. These methods were compared based on testing accuracy, training accuracy, area under the curve, sensitivity, and specificity. Furthermore, logistic regression analysis was applied to National Longitudinal Study of Adolescence Health Survey (Add health) dataset to examine the relationship between anxiety or panic disorder and history of childhood maltreatment, medical conditions such as ADHD, PTSD, some socio-economic conditions and addiction. Results indicated; history of abuse has a significant effect …


Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun May 2021

Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun

Mathematical Sciences Technical Reports (MSTR)

Data can be lost for different reasons, but sometimes the missingness is a part of the data collection process. Unbiased and efficient estimation of the parameters governing the response mean model requires the missing data to be appropriately addressed. This paper compares and contrasts the Maximum Likelihood and Inverse Probability Weighting estimators in an Outcome-Dependendent Sampling design that deliberately generates incomplete observations. WE demonstrate the comparison through numerical simulations under varied conditions: different coefficient of determination, and whether or not the mean model is misspecified.


We’Re Here To Get You There: A Statistical Analysis Of Bridgewater State University’S Transit System, Abigail Adams May 2021

We’Re Here To Get You There: A Statistical Analysis Of Bridgewater State University’S Transit System, Abigail Adams

Honors Program Theses and Projects

Bridgewater State University first established its on-campus transportation service in January of 1984. While it began only running as an on-campus service for students throughout the day, the service grew to expand by offering an off-campus connection to the neighboring city of Brockton and absorbed the night service system from the campus safety team. As BSU Transit continues to grow, the organization is seeking ways to improve their overall service and better prepare their fleet and driver pool to accommodate this growth. The purpose of this research is to analyze trends among the data collected by BSU Transit and assist …


Guidelines For Regression Analysis In Sas And R: A Case Study, Sarah Milligan May 2021

Guidelines For Regression Analysis In Sas And R: A Case Study, Sarah Milligan

Honors Program Theses and Projects

When a player is a free agent, an individual who is able to sign to any team, one wonders what their best option is. Will signing with Team A or Team B provide them with the largest salary? What factors will affect their salary the most? Does last year’s statistics have a strong impact on next year’s salary? These questions can be answered by performing a regression analysis on previous years data. The primary focus of this project is to determine the most important variables related to an NBA salary. Likewise, the statistical programs SAS and R will be compared …


A Study On Differing Generational Values And Expectations In Corporate America, Abigail Grella May 2021

A Study On Differing Generational Values And Expectations In Corporate America, Abigail Grella

Honors Program Theses and Projects

This paper examines the most common factors that lead to voluntary employee turnover, and the implications employee turnover has on an organization. Additionally, this paper will consider the varying values and workplace expectations of different demographic groups such as Millennials, Generation X, Generation Y, and Baby Boomers and how such factors could influence voluntary turnover. A study is conducted from survey results gathered across a large span of generations that are currently employed. Using statistical analysis employing t-tests and a Mood’s Median test, the results show that different generations have differently weighing values for specific organizational offerings. The results show …


Adventures In The "Islands" - Enhancing Student Engagement In Teaching Statistics, Leszek Gawarecki Feb 2021

Adventures In The "Islands" - Enhancing Student Engagement In Teaching Statistics, Leszek Gawarecki

Mathematics Presentations And Conference Materials

The factors for enhancing student engagement frequently identified are active and problem-based learning as well as real-life experience relevant to students' interests. The importance of using real data in teaching statistics has been repeatedly emphasized and its importance is growing. However, data collection, as part of a student project, faces serious practical problems. It is time-consuming, may require access to equipment, or raise ethical issues.


Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach Jan 2021

Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach

All Faculty Scholarship

Event studies, a half-century-old approach to measuring the effect of events on stock prices, are now ubiquitous in securities fraud litigation. In determining whether the event study demonstrates a price effect, expert witnesses typically base their conclusion on whether the results are statistically significant at the 95% confidence level, a threshold that is drawn from the academic literature. As a positive matter, this represents a disconnect with legal standards of proof. As a normative matter, it may reduce enforcement of fraud claims because litigation event studies typically involve quite low statistical power even for large-scale frauds.

This paper, written for …


Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec Dec 2020

Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec

Student Research

The release of Statcast data in 2015 was revolutionary for data analysis in the game of baseball. Many analysts have begun using this data regularly, but none have used it exclusively. Often older, less reliable statistics (on-base percentage) are still used in favor of the newer statistics (weighted runs created plus). In this paper, we attempt to explain the variation in weighted runs created plus (wRC+) using Statcast variables such as exit velocity and launch angle. We find that exit velocity along with other Statcast variables, can explain as much as 70% of the variation in wRC+. Launch angle can …


“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin Aug 2020

“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin

Statistical and Data Sciences: Faculty Publications

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to “play the whole game” of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question “How do I spend my time?” On the one hand, the exercise involves answering a question with near universal appeal, but …


Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison Jul 2020

Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison

Presentations

Do you trudge through gathering statistics at fiscal or calendar year-end? Do you wonder why you track certain things, thinking many seem outdated or irrelevant? Many places seem to keep counting certain statistics because "that's what they've always done." For e-resources, how do you integrate those with physical counts and reconcile the variations (updated e-resources versus re-cataloged physical items)? What about repository downloads and other web traffic? The quantity of stats that libraries track is staggering and keeps growing. This program will encourage attendees to stop and evaluate what and why they're gathering data and help identify possible alternatives to …


The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte May 2020

The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte

Student Scholar Symposium Abstracts and Posters

Osteoporosis is a disease that causes the degradation of bone, leading to an increased risk of fracture. 1 in 3 women over the age of 50 will be affected by Osteoporosis. This study aims to understand how bone is affected by sleep deprivation in estrogen-deficient rats, and how Zoledronate might negate the inimical effects of sleep deprivation on bone. As bone mineral density (BMD) is a crude evaluation of the architectural changes seen in Osteoporosis, trabecular thickness may serve as a better single evaluation of bone health. 31 Wistar female rats were ovariectomized and separated into 4 random groups. The …


Analyzing Competitive Balance In Professional Sport, Kevin Alwell May 2020

Analyzing Competitive Balance In Professional Sport, Kevin Alwell

Honors Scholar Theses

In this paper we review several measures to statistically analyze competitive balance and report which leagues have a wider variance of performance amongst its competitors. Each league seeks to maintain high levels of parity, making matches and overall season more unpredictable and appealing to the general audience. Here we quantify competitive advantage across major sports leagues in numbers using several statistical methods in order for leagues to optimize their revenue.


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


Dice Questions Answered, Warren Campbell, William P. Dolan Apr 2020

Dice Questions Answered, Warren Campbell, William P. Dolan

SEAS Faculty Publications

Superstitious discussion of fair and unfair dice has pervaded the tabletop gaming industry since its inception. Many of these are not based on any quantitative data or studies. Consequently, misconceptions have been spread widely. One dice float test video on Youtube currently has 925,000 views (Fisher, 2015a). To combat the flood of misconceptions we investigated the following questions: 1) Are dice cursed? 2) Are D20s (20-sided dice) less fair than D6s (6-sided dice)? 3) Do float tests tell anything about the fairness of dice? 4) Are some dice systems inherently fairer than others? 5) Are density differences or dimensions more …


Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou Apr 2020

Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou

Honors Projects

This paper investigates the quality of contact that a pitcher allows. Not much is currently known about quality of contact, but if factors determining quality of contact could be determined it could assist teams in identifying and developing pitching talent. There are many problems that come with investigating the control pitchers have over contact allowed, but one area to investigate is whether quality of contact is a repeatable skill. Furthermore, if it is a repeatable skill, then it is important to investigate what kind of benefit controlling contact allowed brings a pitcher. Along with this, groundball and flyball tendencies, and …


Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen Mar 2020

Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen

Publications

Modern smart homes are being equipped with certain renewable energy resources that can produce their own electric energy. From time to time, these smart homes or microgrids are also capable of supplying energy to other houses, buildings, or energy grid in the time of available self-produced renewable energy. Therefore, researches have been carried out to develop optimal trading strategies, and many recent technologies are also being used in combination with microgrids. One such technology is blockchain, which works over decentralized distributed ledger. In this paper, we develop a blockchain based approach for microgrid energy auction. To make this auction more …


Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.