Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Regression

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 24 of 24

Full-Text Articles in Statistical Models

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña Nov 2023

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña

Electronic Theses and Dissertations

Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …


Analyzing Relationships With Machine Learning, Oscar Ko Feb 2023

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan Sep 2022

Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan

SMU Data Science Review

Millions of people live with diabetes worldwide [7]. To mitigate some of the many symptoms associated with diabetes, an estimated 350,000 people in the United States rely on insulin pumps [17]. For many of these people, how effectively their insulin pump performs is the difference between sleeping through the night and a life threatening emergency treatment at a hospital. Three programmed insulin pump therapy settings governing effective insulin pump function are: Basal Rate (BR), Insulin Sensitivity Factor (ISF), and Carbohydrate Ratio (ICR). For many people using insulin pumps, these therapy settings are often not correct, given their physiological needs. While …


The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang Jun 2022

The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang

Medical Student Research Symposium

Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.

Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …


Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto May 2022

Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto

Industrial Engineering Undergraduate Honors Theses

The outbreak of COVID-19 has created a major crisis across the world since its start in 2019, and its influence on every realm of society is undeniable. Globally, more than 500 million cases have been recorded since March 2020, with almost 6 million deaths. In the wake of this crisis, many governments and health organizations have taken steps and precautions to mitigate its spread. These steps involve public mandates of information, reducing frequency of personal contact, and use of masks to minimize the risk of transmission. Current access to mobility data released from Google detailing population movements has provided a …


Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk Sep 2019

Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk

Dissertations, Theses, and Capstone Projects

This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …


Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin Aug 2019

Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin

All NMU Master's Theses

The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


An Overview And Evaluation Of Synthetc: A Statistical Model For Extra-Tropical Cyclones, Rafael Uryayev Jan 2019

An Overview And Evaluation Of Synthetc: A Statistical Model For Extra-Tropical Cyclones, Rafael Uryayev

Dissertations and Theses

Extratropical cyclones (ETCs) are the most common weather phenomena affecting the United States, Canada, and Europe. They can pose serious hazards over large swaths of area. In this thesis, a statistical model of ETCs, called SynthETC, is discussed. The model accounts for the for genesis, track path, termination, and intensity of statistically generated ETCs. Genesis is modeled as a Poisson process, whose mean is determined by climate and historical information. Tracks are modeled as a regression-mean determined by climate and historical information plus a stochastic component. Lysis is modeled using logistic regression, with climate states as covariates. Intensity is modeled …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman Jul 2018

Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman

SMU Data Science Review

In this paper, we present an analysis of flight data in order to determine whether the application of the Edge Aerodynamix Conformal Vortex Generator (CVG), applied to the wings of aircraft, reduces fuel flow during cruising conditions of flight. The CVG is a special treatment and film applied to the wings of an aircraft to protect the wings and reduce the non-laminar flow of air around the wings during flight. It is thought that by reducing the non-laminar flow or vortices around and directly behind the wings that an aircraft will move more smoothly through the air and provide a …


Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard May 2018

Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard

Electronic Theses and Dissertations

Sinkholes are a significant hazard for the southeastern United States. Although differences in climate are known to affect karst environments differently, quantitative analyses correlating sinkhole formation with climate variables is lacking. A temporal linear regression for Florida sinkholes and two modeled regressions for Tennessee sinkholes were produced: a general linearized logistic regression and a MaxEnt derived species distribution model. Temporal results showed highly significant correlations with precipitation, teleconnection patterns, temperature, and CO2, while spatial results showed highly significant correlations with precipitation, wind speed, solar radiation, and maximum temperature. Regression results indicated that some sinkhole formation variability could be …


Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff Mar 2018

Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff

FIU Electronic Theses and Dissertations

The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.

Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …


Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek Aug 2017

Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek

Electronic Theses and Dissertations

ABSTRACT

Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models

By

Gregory Frank Malek

Stephen F. Austin State University, Masters in Statistics Program,

Nacogdoches, Texas, U.S.A.

g_m_2002@live.com

This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …


Examining Cost Functionality And Optimization: A Case Study On Testing The Reasonableness Of New Aircraft Using Historical Aircraft Data, Katherine Jozefiak May 2016

Examining Cost Functionality And Optimization: A Case Study On Testing The Reasonableness Of New Aircraft Using Historical Aircraft Data, Katherine Jozefiak

Arts & Sciences Electronic Theses and Dissertations

When pursuing business by competing for government contracts, proving the submitted price is reasonable is often required. This proof is called a test of reasonableness. This study analyzes data from historical aircraft programs in relation of a new aircraft program in order to demonstrate the estimated cost of the new program is reasonable. The purpose of this study is to investigate three questions. Is the new program cost reasonable using current industry and government parameters? Is it better to look at programs from a total cost perspective or break the total cost into subcategory levels? Finally, this study applies a …


Hidden Trends In Nfl Data, Scott Santor Apr 2014

Hidden Trends In Nfl Data, Scott Santor

Statistics

This is an analysis on National Football League (NFL) data for the 2013-2014 regular season. The main goal is to find hidden trends in game data that can ultimately determine which factors are statistically significant to award a team with their ultimate objective, a win.

The main response variable to be examined is total wins throughout the regular season, and an alternative dependent variable is spread; the difference between a team’s points scored, and points against. Spread is analyzed to provide a different quantitative response variable that can be both positive and negative.

Game data was gathered from ESPN.com box …


A New Diagnostic Test For Regression, Yun Shi Apr 2013

A New Diagnostic Test For Regression, Yun Shi

Electronic Thesis and Dissertation Repository

A new diagnostic test for regression and generalized linear models is discussed. The test is based on testing if the residuals are close together in the linear space of one of the covariates are correlated. This is a generalization of the famous problem of spurious correlation in time series regression. A full model building approach for the case of regression was developed in Mahdi (2011, Ph.D. Thesis, Western University, ”Diagnostic Checking, Time Series and Regression”) using an iterative generalized least squares algorithm. Simulation experiments were reported that demonstrate the validity and utility of this approach but no actual applications were …


An Economic Alternative To The C Chart, Ryan William Black Dec 2012

An Economic Alternative To The C Chart, Ryan William Black

Graduate Theses and Dissertations

Because the probability of Type I error is not evenly distributed beyond upper and lower three-sigma limits the c chart is theoretically inappropriate for a monitor of Poisson distributed phenomena. Furthermore, the normal approximation to the Poisson is of little use when c is small. These practical and theoretical concerns should motivate the computation of true error rates associated with individuals control assuming the Poisson distribution. An economic alternative to the c chart is described as a statistical model of upward shift from c0 to c1 and the two charts are compared in theory. For a range of c chart …


The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang Oct 2009

The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang

Department of Statistics: Faculty Publications

No abstract provided.


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …


To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little Nov 2003

To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …


Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe Jan 2003

Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe

UW Biostatistics Working Paper Series

Accurate disease diagnosis is critical for health care. New diagnostic and screening tests must be evaluated for their abilities to discriminate disease from non-diseased states. The partial area under the ROC curve (partial AUC) is a measure of diagnostic test accuracy. We present an interpretation of the partial AUC that gives rise to a new non-parametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modelling framework for making inference about covariate effects on the partial AUC. Such …


Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins Sep 2002

Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.

Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …