#### How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel

##### Thinking Matters Symposium

Due to the pandemic, people have started relying more on televisions, news, social media, and other news outlets for guidance. Moreover, with the increasing amount of news, data, and information there is also an increase in the amount of misleading statistics. People’s opinions and decisions significantly depend on the data, statistics, and information that they are exposed to, as well as their sources. For this project, we want to look at how information and its sources are affecting the decision made by the general public for the usage of the Portland Transit System. It is very important to know ...

Jan 2021

#### Review Of Social Workers Count: Numbers And Social Issues By Michael Anthony Lewis, Michael T. Catalano

##### Numeracy

Lewis, Michael Anthony. 2017. Social Workers Count: Numbers and Social Issues. 2019. New York: Oxford University Press. 223 pp. ISBN 978-019046713-5

The numeracy movement, although largely birthed within the mathematics community, is an outside-the-box endeavor which has always sought to break down or at least transgress traditional disciplinary boundaries. Michael Anthony Lewis’s book is a testament that this effort is succeeding. Lewis is a social worker and sociologist with an impressive resume, author of Economics for Social Workers, co-editor of The Ethics and Economics of the Basic Income Guarantee, and member of the faculty at the Silberman School of ...

Indispensable Statistics For The Behavioral Sciences ~With Spss 26, Howard Reid Ph.D. Jan 2021

#### Indispensable Statistics For The Behavioral Sciences ~With Spss 26, Howard Reid Ph.D.

##### Open Educational Resources (OER)

While there are many fine introductory statistics books, undergraduate students often continue to view statistics courses negatively. And many fear they will be unable to master the basic level of understanding that is essential to progress in their majors. The present text is an attempt to rethink what students majoring in the behavioral sciences absolutely must learn in an introductory statistics course and how best to organize the presentation of this material so they can succeed in their chosen field of study.

Every book is written from some perspective. The perspective of this book is that a first course in ...

#### Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

##### Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting ...

Dec 2020

#### Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha

##### Statistical Science Theses and Dissertations

Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Methods such as video transects for estimating red snapper abundance and aerial surveys for estimating moose population sizes are highly affected by these problems since total abundance will be underestimated if missing/mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts are observed when visibility is high, partially observed when visibility ...

Dec 2020

#### Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

##### Statistical Science Theses and Dissertations

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy ...

Dec 2020

#### Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec

##### Student Research

The release of Statcast data in 2015 was revolutionary for data analysis in the game of baseball. Many analysts have begun using this data regularly, but none have used it exclusively. Often older, less reliable statistics (on-base percentage) are still used in favor of the newer statistics (weighted runs created plus). In this paper, we attempt to explain the variation in weighted runs created plus (wRC+) using Statcast variables such as exit velocity and launch angle. We find that exit velocity along with other Statcast variables, can explain as much as 70% of the variation in wRC+. Launch angle can ...

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman Nov 2020

#### Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

##### Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model ...

Fitting Quadrics With A Bayesian Prior, Daniel Beale, Yong-Liang Yang, Neill Campbell, Darren Cosker, Peter Hall Oct 2020

#### Fitting Quadrics With A Bayesian Prior, Daniel Beale, Yong-Liang Yang, Neill Campbell, Darren Cosker, Peter Hall

##### Computational Visual Media

Quadrics are a compact mathematical formulation for a range of primitive surfaces. A problem arises when there are not enough data points to compute the model but knowledge of the shape is available. This paper presents a method for fitting a quadric with a Bayesian prior. We use a matrix normal prior in order to favour ellipsoids when fitting to ambiguous data. The results show the algorithm copes well when there are few points in the point cloud, competing with contemporary techniques in the area.

“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin Aug 2020

#### “Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin

##### Statistical and Data Sciences: Faculty Publications

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to “play the whole game” of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question “How do I spend my time?” On the one hand, the exercise involves answering a question with near universal appeal, but ...

Biennial And Low-Frequency Components Of El Niño/Southern Oscillation, James Michael Ryan Aug 2020

#### Biennial And Low-Frequency Components Of El Niño/Southern Oscillation, James Michael Ryan

##### Theses and Dissertations

El Niño/Southern Oscillation (ENSO) is a coupled oscillation of sea surface temperatures (SSTs), winds, and air pressure in the eastern and central tropical Pacific, that repeats with quasi-regularity, every 2–7 years. Although the ENSO’s spectral peak is found at a 4–7-yr period, composite El Niño events, taken as the 84 months before and after the peak of each El Niño, show that the length of each event, and often the following La Niña if there is one, usually falls within a quasi-biennial (QB) range of around 18–42 months. We argue that the biennial range of ...

Jul 2020

#### Three Creativity-Fostering Projects Implemented In A Statistics Class, Margaret Adams

##### Journal of Humanistic Mathematics

Undergraduates in an introductory statistics class at a rural Southeastern college were assigned three creativity-fostering projects: statistics vocabulary crossword puzzle, word wall, and graffiti art poster. Given math anxiety, fear of failure, and lack of enthusiasm, it seemed imperative to spark interest and involvement. Rhodes 4P’s model (1961) served as the framework for this intrinsic case study involving 62 students. Independent thinking and research, peer collaboration, and use of art supplies within this model (person, press, process and product) generated remarkable learning outcomes. Grading rubrics focused on originality, quality and statistics content. Projects were classified into three qualitative categories ...

Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison Jul 2020

#### Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison

##### Presentations

Do you trudge through gathering statistics at fiscal or calendar year-end? Do you wonder why you track certain things, thinking many seem outdated or irrelevant? Many places seem to keep counting certain statistics because "that's what they've always done." For e-resources, how do you integrate those with physical counts and reconcile the variations (updated e-resources versus re-cataloged physical items)? What about repository downloads and other web traffic? The quantity of stats that libraries track is staggering and keeps growing. This program will encourage attendees to stop and evaluate what and why they're gathering data and help identify ...

Jul 2020

#### Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen

##### Statistical Science Theses and Dissertations

Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational ...

#### Bayesian Reliability Analysis For Optical Media Using Accelerated Degradation Test Data, Kun Bu

ISO (the International Organization for Standardization) 10995:2011 is the inter-national standard providing guidelines for assessing the reliability and service life of optical media, which is designed to be highly reliable and possesses a long lifetime. A well-known challenge of reliability analysis for highly reliable devices is that it is hard to obtain sufficient failure data under their normal use conditions. Accelerated degradation tests (ADTs) are commonly used to quickly obtain physical degradation data under elevated stress conditions, which are then extrapolated to predict reliability under the normal use condition. This standard achieves the estimation of the lifetime of recordable ...

Research In Short Term Actuarial Modeling, Elijah Howells Jun 2020

#### Research In Short Term Actuarial Modeling, Elijah Howells

##### Electronic Theses, Projects, and Dissertations

This paper covers mathematical methods used to conduct actuarial analysis in the short term, such as policy deductible analysis, maximum covered loss analysis, and mixtures of distributions. Assessment of a loss variable's distribution under the effect of a policy deductible, as well as one with an implemented maximum covered loss, and under both a policy deductible and maximum covered loss will also be covered. The derivation, meaning, and use of cost per loss and cost per payment will be discussed, as will those of an aggregate sum distribution, stop loss policy, and maximum likelihood estimation. For each topic, special ...

May 2020

#### A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez

##### School of Arts & Sciences Theses

In this thesis, our objective is to study the relationship between transaction price and volume in the BTC/USD Coinbase exchange. In the second chapter, we develop a consecutive CUSUM algorithm to detect instantaneous changes in the arrival rate of market orders. We begin by estimating a baseline rate using the assumption of a local time-homogeneous Poisson process. Our observations lead us to reject the plausibility of a time-homogeneous Poisson model on a more global scale by using a chi squared test. We thus proceed to use CUSUM-based alarms to detect consecutive upward and downward changes in the arrival rate ...

May 2020

#### Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen

##### Statistical Science Theses and Dissertations

In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

#### Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

##### Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible ...

#### Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma

##### Department of Education Policy and Leadership Theses and Dissertations

The estimation of parameters in structural equation modeling (SEM) has been primarily based on the maximum likelihood estimator (MLE) and relies on large sample asymptotic theory. Consequently, the results of the SEM analyses with small samples may not be as satisfactory as expected. In contrast, informative priors typically do not require a large sample, and they may be helpful for improving the quality of estimates in the SEM models with small samples. However, the role of informative priors in the Bayesian SEM has not been thoroughly studied to date. Given the limited body of evidence, specifying effective informative priors remains ...

May 2020

#### A Novel Approach To Updating Municipal Tax Parcel Impervious Surface Calculations, Patrick Muradaz

##### Senior Honors Projects, 2020-current

Accurate impervious surface calculations are important to many municipalities due to the high volumes of surface rainwater runoff caused by high impervious surface density. Municipalities must deal with this runoff through the establishment and maintenance of drainage facilities. To help offset the added cost of these facilities, many municipalities impose taxes and fees on privately owned impervious surfaces such as homes, driveways, and patios. Currently, in order for a city like Harrisonburg to calculate tax parcel impervious surface density, aerial images must be manually digitized or mapped using computer-based classification techniques using predictive models. These methods of impervious surface calculations ...

Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

#### Using Stability To Select A Shrinkage Method, Dean Dustin

##### Dissertations and Theses in Statistics

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The ...

Analyzing Competitive Balance In Professional Sport, Kevin Alwell May 2020

#### Analyzing Competitive Balance In Professional Sport, Kevin Alwell

##### Honors Scholar Theses

In this paper we review several measures to statistically analyze competitive balance and report which leagues have a wider variance of performance amongst its competitors. Each league seeks to maintain high levels of parity, making matches and overall season more unpredictable and appealing to the general audience. Here we quantify competitive advantage across major sports leagues in numbers using several statistical methods in order for leagues to optimize their revenue.

#### The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte

##### Student Scholar Symposium Abstracts and Posters

Osteoporosis is a disease that causes the degradation of bone, leading to an increased risk of fracture. 1 in 3 women over the age of 50 will be affected by Osteoporosis. This study aims to understand how bone is affected by sleep deprivation in estrogen-deficient rats, and how Zoledronate might negate the inimical effects of sleep deprivation on bone. As bone mineral density (BMD) is a crude evaluation of the architectural changes seen in Osteoporosis, trabecular thickness may serve as a better single evaluation of bone health. 31 Wistar female rats were ovariectomized and separated into 4 random groups. The ...

Analysis Of Gas Mileage Of A Car, Joshua Ballard-Myer Apr 2020

#### Analysis Of Gas Mileage Of A Car, Joshua Ballard-Myer

##### Georgia College Student Research Events

The objective of this work is to analyze a data set, Auto, from the R package ISLR: Introduction to Statistical Learning in R. The data set includes information for 392 observations on 9 variables including gas mileage, horsepower, weight in pounds, and engine displacement in cubic inches. The data set was taken from the StatLib library maintained at Carnegie Mellon University. The primary response variable will be gas mileage in miles per gallon, with all other variables serving as predictors, but other relationships with other response variables such as acceleration will be explored. Results were similar to expected; traits desirable ...

Dice Questions Answered, Warren Campbell, William P. Dolan Apr 2020

#### Dice Questions Answered, Warren Campbell, William P. Dolan

##### SEAS Faculty Publications

Superstitious discussion of fair and unfair dice has pervaded the tabletop gaming industry since its inception. Many of these are not based on any quantitative data or studies. Consequently, misconceptions have been spread widely. One dice float test video on Youtube currently has 925,000 views (Fisher, 2015a). To combat the flood of misconceptions we investigated the following questions: 1) Are dice cursed? 2) Are D20s (20-sided dice) less fair than D6s (6-sided dice)? 3) Do float tests tell anything about the fairness of dice? 4) Are some dice systems inherently fairer than others? 5) Are density differences or dimensions ...

Apr 2020

#### Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou

##### Honors Projects

This paper investigates the quality of contact that a pitcher allows. Not much is currently known about quality of contact, but if factors determining quality of contact could be determined it could assist teams in identifying and developing pitching talent. There are many problems that come with investigating the control pitchers have over contact allowed, but one area to investigate is whether quality of contact is a repeatable skill. Furthermore, if it is a repeatable skill, then it is important to investigate what kind of benefit controlling contact allowed brings a pitcher. Along with this, groundball and flyball tendencies, and ...

Apr 2020

#### Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

##### Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees ...

Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen Mar 2020

#### Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen

##### Publications

Modern smart homes are being equipped with certain renewable energy resources that can produce their own electric energy. From time to time, these smart homes or microgrids are also capable of supplying energy to other houses, buildings, or energy grid in the time of available self-produced renewable energy. Therefore, researches have been carried out to develop optimal trading strategies, and many recent technologies are also being used in combination with microgrids. One such technology is blockchain, which works over decentralized distributed ledger. In this paper, we develop a blockchain based approach for microgrid energy auction. To make this auction more ...

Feb 2020

#### The Importance Of Type I Error Rates When Studying Bias In Monte Carlo Studies In Statistics, Michael Harwell

##### Journal of Modern Applied Statistical Methods

Two common outcomes of Monte Carlo studies in statistics are bias and Type I error rate. Several versions of bias statistics exist but all employ arbitrary cutoffs for deciding when bias is ignorable or non-ignorable. This article argues Type I error rates should be used when assessing bias.