Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Statistical Models

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


A Course In Data Science: R And Prediction Modeling, Adam Kapelner May 2022

A Course In Data Science: R And Prediction Modeling, Adam Kapelner

Open Educational Resources

This is a self-contained course in data science and machine learning using R. It covers philosophy of modeling with data, prediction via linear models, machine learning including support vector machines and random forests, probability estimation and asymmetric costs using logistic regression and probit regression, underfitting vs. overfitting, model validation, handling missingness and much more. There is formal instruction of data manipulation using dplyr and data.table, visualization using ggplot2 and statistical computing.


Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski Jun 2019

Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski

Beyond: Undergraduate Research Journal

The purpose of this research project is to use statistical analysis, data mining, and machine learning techniques to determine identifiable factors in child welfare service records that could lead to a child entering the foster care system multiple times. This would allow us the capability of accurately predicting a case’s outcome based on these factors. We were provided with eight years of data in the form of multiple spreadsheets from Partnership for Strong Families (PSF), a child welfare services organization based in Gainesville, Florida, who is contracted by the Florida Department for Children and Families (DCF). This data contained a …


Analysis Of 2016-17 Major League Soccer Season Data Using Poisson Regression With R, Ian D. Campbell May 2018

Analysis Of 2016-17 Major League Soccer Season Data Using Poisson Regression With R, Ian D. Campbell

Undergraduate Theses and Capstone Projects

To the outside observer, soccer is chaotic with no given pattern or scheme to follow, a random conglomeration of passes and shots that go on for 90 minutes. Yet, what if there was a pattern to the chaos, or a way to describe the events that occur in the game quantifiably. Sports statistics is a critical part of baseball and a variety of other of today’s sports, but we see very little statistics and data analysis done on soccer. Of this research, there has been looks into the effect of possession time on the outcome of a game, the difference …


Visualizing Lab And Phenotype Associations Using Phewas And Electronic Health Records, Brenda Emerson, Miriam Goldman, Sahiti Kolli Jul 2017

Visualizing Lab And Phenotype Associations Using Phewas And Electronic Health Records, Brenda Emerson, Miriam Goldman, Sahiti Kolli

Honors Projects

As the digitization of patient health records is becoming more common, we are given a great opportunity to analyze these records and hopefully make discoveries about diseases or medicines. Being given large datasets of Electronic Health Records, I and two other students decided to look for novel phenotype associations with mean lab values, look to see whether the presence of a lab had associations with a phenotype, and create an interactive application to visual the associations between labs and phenotypes.


Black Cloud Randomization Test, Nicholas S. Vanni Jan 2016

Black Cloud Randomization Test, Nicholas S. Vanni

Williams Honors College, Honors Research Projects

The Black Cloud Randomization Test looks at a nontraditional question and attempts to answer the question using unique statistics. The purpose of this paper is to apply what has been learned throughout the years and apply this knowledge to a final project. Data for this project follows an emergency room’s on call schedule, as well as the number of traumas that came in during each day shift. The project builds on what has been already learned and helps to open a different way of working with statistics. The project was coded in the R software. With different restrictions, there are …


Hidden Trends In Nfl Data, Scott Santor Apr 2014

Hidden Trends In Nfl Data, Scott Santor

Statistics

This is an analysis on National Football League (NFL) data for the 2013-2014 regular season. The main goal is to find hidden trends in game data that can ultimately determine which factors are statistically significant to award a team with their ultimate objective, a win.

The main response variable to be examined is total wins throughout the regular season, and an alternative dependent variable is spread; the difference between a team’s points scored, and points against. Spread is analyzed to provide a different quantitative response variable that can be both positive and negative.

Game data was gathered from ESPN.com box …


Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe Mar 2014

Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe

Joseph M Hilbe

A basic overview of how to model and interpret a logistic regression model, as well as how to obtain the predicted probability or fit of the model and calculate its confidence intervals. R code used for all examples; some Stata is provided as a contrast.


Persistence And Anti-Persistence: Theory And Software, Justin Quinn Veenstra Feb 2013

Persistence And Anti-Persistence: Theory And Software, Justin Quinn Veenstra

Electronic Thesis and Dissertation Repository

Persistent and anti-persistent time series processes show what is called hyperbolic decay. Such series play an important role in the study of many diverse areas such as geophysics and financial economics. They are also of theoretical interest. Fractional Gaussian noise (FGN) and fractionally-differeneced white noise are two widely known examples of time series models with hyperbolic decay. New closed form expressions are obtained for the spectral density functions of these models. Two lesser known time series models exhibiting hyperbolic decay are introduced and their basic properties are derived. A new algorithm for approximate likelihood estimation of the models using frequency …


Basic R Matrix Operations, Joseph Hilbe Aug 2011

Basic R Matrix Operations, Joseph Hilbe

Joseph M Hilbe

No abstract provided.