Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2020

Statistics

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 30 of 44

Full-Text Articles in Physical Sciences and Mathematics

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha Dec 2020

Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha

Statistical Science Theses and Dissertations

Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Methods such as video transects for estimating red snapper abundance and aerial surveys for estimating moose population sizes are highly affected by these problems since total abundance will be underestimated if missing/mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts are observed when visibility is high, partially observed when visibility is low …


Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu Dec 2020

Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

Statistical Science Theses and Dissertations

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy between …


Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec Dec 2020

Can Statcast Variables Explain The Variation In Weighted Runs Created Plus?, Ryan Kupiec

Student Research

The release of Statcast data in 2015 was revolutionary for data analysis in the game of baseball. Many analysts have begun using this data regularly, but none have used it exclusively. Often older, less reliable statistics (on-base percentage) are still used in favor of the newer statistics (weighted runs created plus). In this paper, we attempt to explain the variation in weighted runs created plus (wRC+) using Statcast variables such as exit velocity and launch angle. We find that exit velocity along with other Statcast variables, can explain as much as 70% of the variation in wRC+. Launch angle can …


Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman Nov 2020

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model, and test …


“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin Aug 2020

“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin

Statistical and Data Sciences: Faculty Publications

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to “play the whole game” of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question “How do I spend my time?” On the one hand, the exercise involves answering a question with near universal appeal, but …


Biennial And Low-Frequency Components Of El Niño/Southern Oscillation, James Michael Ryan Aug 2020

Biennial And Low-Frequency Components Of El Niño/Southern Oscillation, James Michael Ryan

Theses and Dissertations

El Niño/Southern Oscillation (ENSO) is a coupled oscillation of sea surface temperatures (SSTs), winds, and air pressure in the eastern and central tropical Pacific, that repeats with quasi-regularity, every 2–7 years. Although the ENSO’s spectral peak is found at a 4–7-yr period, composite El Niño events, taken as the 84 months before and after the peak of each El Niño, show that the length of each event, and often the following La Niña if there is one, usually falls within a quasi-biennial (QB) range of around 18–42 months. We argue that the biennial range of ENSO events stems from the …


Bayesian Topological Machine Learning, Christopher A. Oballe Aug 2020

Bayesian Topological Machine Learning, Christopher A. Oballe

Doctoral Dissertations

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …


Three Creativity-Fostering Projects Implemented In A Statistics Class, Margaret Adams Jul 2020

Three Creativity-Fostering Projects Implemented In A Statistics Class, Margaret Adams

Journal of Humanistic Mathematics

Undergraduates in an introductory statistics class at a rural Southeastern college were assigned three creativity-fostering projects: statistics vocabulary crossword puzzle, word wall, and graffiti art poster. Given math anxiety, fear of failure, and lack of enthusiasm, it seemed imperative to spark interest and involvement. Rhodes 4P’s model (1961) served as the framework for this intrinsic case study involving 62 students. Independent thinking and research, peer collaboration, and use of art supplies within this model (person, press, process and product) generated remarkable learning outcomes. Grading rubrics focused on originality, quality and statistics content. Projects were classified into three qualitative categories ranging …


Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison Jul 2020

Data, Stats, Go: Navigating The Intersections Of Cataloging, E-Resource, And Web Analytics Reporting, Rachel S. Evans, Wendy Moore, Jessica Pasquale, Andre Davison

Presentations

Do you trudge through gathering statistics at fiscal or calendar year-end? Do you wonder why you track certain things, thinking many seem outdated or irrelevant? Many places seem to keep counting certain statistics because "that's what they've always done." For e-resources, how do you integrate those with physical counts and reconcile the variations (updated e-resources versus re-cataloged physical items)? What about repository downloads and other web traffic? The quantity of stats that libraries track is staggering and keeps growing. This program will encourage attendees to stop and evaluate what and why they're gathering data and help identify possible alternatives to …


Southwest Pacific Tropical Cyclone Frequency And Intensity Related To Observed And Modeled Geophysical And Aerosol Variables, Rupsa Bhowmick Jul 2020

Southwest Pacific Tropical Cyclone Frequency And Intensity Related To Observed And Modeled Geophysical And Aerosol Variables, Rupsa Bhowmick

LSU Doctoral Dissertations

The dissertation focuses on western region of Southwest Pacific Ocean (SWPO)

basin (135E - 180, and 5S - 35S) tropical cyclone (TC) climatology using observed

and modeled data. The classification-based machine learning approach

identifies the synoptic geophysical and aerosol environment favorable or unfavorable

for TC intensification and intensity change prior to landfall incorporating

observational and satellite data. A multiple poisson regression model with varying

temporal monthly lags was used to build a relationship between the number of

monthly TC days with basin wide average dust aerosol optical depth (AOD), sea

surface temperature (SST), and upper ocean temperature (UOT). This idea …


Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen Jul 2020

Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen

Statistical Science Theses and Dissertations

Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …


Bayesian Reliability Analysis For Optical Media Using Accelerated Degradation Test Data, Kun Bu Jun 2020

Bayesian Reliability Analysis For Optical Media Using Accelerated Degradation Test Data, Kun Bu

USF Tampa Graduate Theses and Dissertations

ISO (the International Organization for Standardization) 10995:2011 is the inter-national standard providing guidelines for assessing the reliability and service life of optical media, which is designed to be highly reliable and possesses a long lifetime. A well-known challenge of reliability analysis for highly reliable devices is that it is hard to obtain sufficient failure data under their normal use conditions. Accelerated degradation tests (ADTs) are commonly used to quickly obtain physical degradation data under elevated stress conditions, which are then extrapolated to predict reliability under the normal use condition. This standard achieves the estimation of the lifetime of recordable media, …


Research In Short Term Actuarial Modeling, Elijah Howells Jun 2020

Research In Short Term Actuarial Modeling, Elijah Howells

Electronic Theses, Projects, and Dissertations

This paper covers mathematical methods used to conduct actuarial analysis in the short term, such as policy deductible analysis, maximum covered loss analysis, and mixtures of distributions. Assessment of a loss variable's distribution under the effect of a policy deductible, as well as one with an implemented maximum covered loss, and under both a policy deductible and maximum covered loss will also be covered. The derivation, meaning, and use of cost per loss and cost per payment will be discussed, as will those of an aggregate sum distribution, stop loss policy, and maximum likelihood estimation. For each topic, special cases …


A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez May 2020

A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez

Theses and Dissertations

In this thesis, our objective is to study the relationship between transaction price and volume in the BTC/USD Coinbase exchange. In the second chapter, we develop a consecutive CUSUM algorithm to detect instantaneous changes in the arrival rate of market orders. We begin by estimating a baseline rate using the assumption of a local time-homogeneous Poisson process. Our observations lead us to reject the plausibility of a time-homogeneous Poisson model on a more global scale by using a chi squared test. We thus proceed to use CUSUM-based alarms to detect consecutive upward and downward changes in the arrival rate of …


Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma May 2020

Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma

Education Policy and Leadership Theses and Dissertations

The estimation of parameters in structural equation modeling (SEM) has been primarily based on the maximum likelihood estimator (MLE) and relies on large sample asymptotic theory. Consequently, the results of the SEM analyses with small samples may not be as satisfactory as expected. In contrast, informative priors typically do not require a large sample, and they may be helpful for improving the quality of estimates in the SEM models with small samples. However, the role of informative priors in the Bayesian SEM has not been thoroughly studied to date. Given the limited body of evidence, specifying effective informative priors remains …


Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions …


Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen May 2020

Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen

Statistical Science Theses and Dissertations

In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.


A Novel Approach To Updating Municipal Tax Parcel Impervious Surface Calculations, Patrick D. Muradaz May 2020

A Novel Approach To Updating Municipal Tax Parcel Impervious Surface Calculations, Patrick D. Muradaz

Senior Honors Projects, 2020-current

Accurate impervious surface calculations are important to many municipalities due to the high volumes of surface rainwater runoff caused by high impervious surface density. Municipalities must deal with this runoff through the establishment and maintenance of drainage facilities. To help offset the added cost of these facilities, many municipalities impose taxes and fees on privately owned impervious surfaces such as homes, driveways, and patios. Currently, in order for a city like Harrisonburg to calculate tax parcel impervious surface density, aerial images must be manually digitized or mapped using computer-based classification techniques using predictive models. These methods of impervious surface calculations …


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte May 2020

The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte

Student Scholar Symposium Abstracts and Posters

Osteoporosis is a disease that causes the degradation of bone, leading to an increased risk of fracture. 1 in 3 women over the age of 50 will be affected by Osteoporosis. This study aims to understand how bone is affected by sleep deprivation in estrogen-deficient rats, and how Zoledronate might negate the inimical effects of sleep deprivation on bone. As bone mineral density (BMD) is a crude evaluation of the architectural changes seen in Osteoporosis, trabecular thickness may serve as a better single evaluation of bone health. 31 Wistar female rats were ovariectomized and separated into 4 random groups. The …


Analyzing Competitive Balance In Professional Sport, Kevin Alwell May 2020

Analyzing Competitive Balance In Professional Sport, Kevin Alwell

Honors Scholar Theses

In this paper we review several measures to statistically analyze competitive balance and report which leagues have a wider variance of performance amongst its competitors. Each league seeks to maintain high levels of parity, making matches and overall season more unpredictable and appealing to the general audience. Here we quantify competitive advantage across major sports leagues in numbers using several statistical methods in order for leagues to optimize their revenue.


Analysis Of Gas Mileage Of A Car, Joshua Ballard-Myer Apr 2020

Analysis Of Gas Mileage Of A Car, Joshua Ballard-Myer

Georgia College Student Research Events

The objective of this work is to analyze a data set, Auto, from the R package ISLR: Introduction to Statistical Learning in R. The data set includes information for 392 observations on 9 variables including gas mileage, horsepower, weight in pounds, and engine displacement in cubic inches. The data set was taken from the StatLib library maintained at Carnegie Mellon University. The primary response variable will be gas mileage in miles per gallon, with all other variables serving as predictors, but other relationships with other response variables such as acceleration will be explored. Results were similar to expected; traits desirable …


Dice Questions Answered, Warren Campbell, William P. Dolan Apr 2020

Dice Questions Answered, Warren Campbell, William P. Dolan

SEAS Faculty Publications

Superstitious discussion of fair and unfair dice has pervaded the tabletop gaming industry since its inception. Many of these are not based on any quantitative data or studies. Consequently, misconceptions have been spread widely. One dice float test video on Youtube currently has 925,000 views (Fisher, 2015a). To combat the flood of misconceptions we investigated the following questions: 1) Are dice cursed? 2) Are D20s (20-sided dice) less fair than D6s (6-sided dice)? 3) Do float tests tell anything about the fairness of dice? 4) Are some dice systems inherently fairer than others? 5) Are density differences or dimensions more …


Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou Apr 2020

Investigating Major League Baseball Pitchers And Quality Of Contact Through Cluster Analysis, Charlie Marcou

Honors Projects

This paper investigates the quality of contact that a pitcher allows. Not much is currently known about quality of contact, but if factors determining quality of contact could be determined it could assist teams in identifying and developing pitching talent. There are many problems that come with investigating the control pitchers have over contact allowed, but one area to investigate is whether quality of contact is a repeatable skill. Furthermore, if it is a repeatable skill, then it is important to investigate what kind of benefit controlling contact allowed brings a pitcher. Along with this, groundball and flyball tendencies, and …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


An Actuarial Approach To Personal Injury Protection Severity, Jason Colgrove Mar 2020

An Actuarial Approach To Personal Injury Protection Severity, Jason Colgrove

Undergraduate Honors Theses

Insurance companies examine the risk of financial losses for their policyholders as a way to accurately price insurance policies. Within the automobile insurance sector, the frequency of crashes and the associated liabilities started to increase in late 2013 when it had been on the decline for close to a decade. The purpose of this research focuses on the possible correlated variables that could lead to a better understanding of this change. To embark on this task, we teamed up with the Society of Actuaries, Casualty Actuarial Society, and the American Property Casualty Insurance Association to obtain data regarding frequency, severity, …


Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen Mar 2020

Deal: Differentially Private Auction For Blockchain Based Microgrids Energy Trading, Muneeb Ul Hassan, Mubashir Husain Rehmani, Jinjun Chen

Publications

Modern smart homes are being equipped with certain renewable energy resources that can produce their own electric energy. From time to time, these smart homes or microgrids are also capable of supplying energy to other houses, buildings, or energy grid in the time of available self-produced renewable energy. Therefore, researches have been carried out to develop optimal trading strategies, and many recent technologies are also being used in combination with microgrids. One such technology is blockchain, which works over decentralized distributed ledger. In this paper, we develop a blockchain based approach for microgrid energy auction. To make this auction more …


The Importance Of Type I Error Rates When Studying Bias In Monte Carlo Studies In Statistics, Michael Harwell Feb 2020

The Importance Of Type I Error Rates When Studying Bias In Monte Carlo Studies In Statistics, Michael Harwell

Journal of Modern Applied Statistical Methods

Two common outcomes of Monte Carlo studies in statistics are bias and Type I error rate. Several versions of bias statistics exist but all employ arbitrary cutoffs for deciding when bias is ignorable or non-ignorable. This article argues Type I error rates should be used when assessing bias.


Art, Artfulness, Or Artifice?: A Review Of The Art Of Statistics: How To Learn From Data, By David Spiegelhalter, Jason Makansi Jan 2020

Art, Artfulness, Or Artifice?: A Review Of The Art Of Statistics: How To Learn From Data, By David Spiegelhalter, Jason Makansi

Numeracy

David Spiegelhalter. 2019. The Art of Statistics: How to Learn From Data. (London: The Penguin Group). 444 pp. ISBN 978-1541618510

The author successfully eases the reader away from the rigor of statistical methods and calculations and into the realm of statistical thinking. Despite an engaging style and attention-grabbing examples, the reader of The Art of Statistics will need more than a casual grounding in statistics to get what Spiegelhalter, I believe, intends from his book. It should be viewed as a companion to a more rigorous textbook on statistical methods but not necessarily a book that makes statistics any …