Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- COBRA (4)
- Southern Methodist University (4)
- City University of New York (CUNY) (3)
- Florida International University (2)
- University of Arkansas, Fayetteville (2)
-
- California Polytechnic State University, San Luis Obispo (1)
- East Tennessee State University (1)
- Northern Michigan University (1)
- Stephen F. Austin State University (1)
- University of Denver (1)
- University of Kentucky (1)
- University of Nebraska - Lincoln (1)
- University of New Hampshire (1)
- Washington University in St. Louis (1)
- Wayne State University (1)
- Western University (1)
- Publication Year
- Publication
-
- SMU Data Science Review (4)
- Electronic Theses and Dissertations (3)
- Dissertations, Theses, and Capstone Projects (2)
- FIU Electronic Theses and Dissertations (2)
- U.C. Berkeley Division of Biostatistics Working Paper Series (2)
-
- All NMU Master's Theses (1)
- Arts & Sciences Electronic Theses and Dissertations (1)
- Department of Statistics: Faculty Publications (1)
- Dissertations and Theses (1)
- Electronic Thesis and Dissertation Repository (1)
- Graduate Theses and Dissertations (1)
- Honors Theses and Capstones (1)
- Industrial Engineering Undergraduate Honors Theses (1)
- Medical Student Research Symposium (1)
- Statistics (1)
- The University of Michigan Department of Biostatistics Working Paper Series (1)
- Theses and Dissertations--Statistics (1)
- UW Biostatistics Working Paper Series (1)
- Publication Type
Articles 1 - 26 of 26
Full-Text Articles in Physical Sciences and Mathematics
Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles
Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles
Honors Theses and Capstones
With the analytics revolution in sports in the past 20 years, it seems that everything that can be quantified is. In basketball though, trying to break the game down into a set of numbers comes with a unique problem. While we've come up with a good set of advanced numbers to measure offensive efficiency, defense is fundamentally harder to quantify. The game is played five on five, but it has often been popular or convenient to model defense as a set of five one on one games. As defenses became more complex into the 2010s, this methodology became more insignificant. …
The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña
The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña
Electronic Theses and Dissertations
Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …
Analyzing Relationships With Machine Learning, Oscar Ko
Analyzing Relationships With Machine Learning, Oscar Ko
Dissertations, Theses, and Capstone Projects
Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.
The dataset is from a Stanford University survey, “How Couples …
Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan
Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan
SMU Data Science Review
Millions of people live with diabetes worldwide [7]. To mitigate some of the many symptoms associated with diabetes, an estimated 350,000 people in the United States rely on insulin pumps [17]. For many of these people, how effectively their insulin pump performs is the difference between sleeping through the night and a life threatening emergency treatment at a hospital. Three programmed insulin pump therapy settings governing effective insulin pump function are: Basal Rate (BR), Insulin Sensitivity Factor (ISF), and Carbohydrate Ratio (ICR). For many people using insulin pumps, these therapy settings are often not correct, given their physiological needs. While …
The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang
The Short-Term Effects Of Fine Airborne Particulate Matter And Climate On Covid-19 Disease Dynamics, El Hussain Shamsa, Kezhong Zhang
Medical Student Research Symposium
Background: Despite more than 60% of the United States population being fully vaccinated, COVID-19 cases continue to spike in a temporal pattern. These patterns in COVID-19 incidence and mortality may be linked to short-term changes in environmental factors.
Methods: Nationwide, county-wise measurements for COVID-19 cases and deaths, fine-airborne particulate matter (PM2.5), and maximum temperature were obtained from March 20, 2020 to March 20, 2021. Multivariate Linear Regression was used to analyze the association between environmental factors and COVID-19 incidence and mortality rates in each season. Negative Binomial Regression was used to analyze daily fluctuations of COVID-19 cases …
Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto
Assessing The Influence Of Health Policy And Population Mobility On Covid-19 Spread In Arkansas, Tayden Barretto
Industrial Engineering Undergraduate Honors Theses
The outbreak of COVID-19 has created a major crisis across the world since its start in 2019, and its influence on every realm of society is undeniable. Globally, more than 500 million cases have been recorded since March 2020, with almost 6 million deaths. In the wake of this crisis, many governments and health organizations have taken steps and precautions to mitigate its spread. These steps involve public mandates of information, reducing frequency of personal contact, and use of masks to minimize the risk of transmission. Current access to mobility data released from Google detailing population movements has provided a …
Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su
Theses and Dissertations--Statistics
When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …
Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk
Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk
Dissertations, Theses, and Capstone Projects
This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …
Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin
Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin
All NMU Master's Theses
The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only …
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
SMU Data Science Review
In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …
Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse
Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse
FIU Electronic Theses and Dissertations
Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.
The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …
An Overview And Evaluation Of Synthetc: A Statistical Model For Extra-Tropical Cyclones, Rafael Uryayev
An Overview And Evaluation Of Synthetc: A Statistical Model For Extra-Tropical Cyclones, Rafael Uryayev
Dissertations and Theses
Extratropical cyclones (ETCs) are the most common weather phenomena affecting the United States, Canada, and Europe. They can pose serious hazards over large swaths of area. In this thesis, a statistical model of ETCs, called SynthETC, is discussed. The model accounts for the for genesis, track path, termination, and intensity of statistically generated ETCs. Genesis is modeled as a Poisson process, whose mean is determined by climate and historical information. Tracks are modeled as a regression-mean determined by climate and historical information plus a stochastic component. Lysis is modeled using logistic regression, with climate states as covariates. Intensity is modeled …
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …
Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman
Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman
SMU Data Science Review
In this paper, we present an analysis of flight data in order to determine whether the application of the Edge Aerodynamix Conformal Vortex Generator (CVG), applied to the wings of aircraft, reduces fuel flow during cruising conditions of flight. The CVG is a special treatment and film applied to the wings of an aircraft to protect the wings and reduce the non-laminar flow of air around the wings during flight. It is thought that by reducing the non-laminar flow or vortices around and directly behind the wings that an aircraft will move more smoothly through the air and provide a …
Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard
Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard
Electronic Theses and Dissertations
Sinkholes are a significant hazard for the southeastern United States. Although differences in climate are known to affect karst environments differently, quantitative analyses correlating sinkhole formation with climate variables is lacking. A temporal linear regression for Florida sinkholes and two modeled regressions for Tennessee sinkholes were produced: a general linearized logistic regression and a MaxEnt derived species distribution model. Temporal results showed highly significant correlations with precipitation, teleconnection patterns, temperature, and CO2, while spatial results showed highly significant correlations with precipitation, wind speed, solar radiation, and maximum temperature. Regression results indicated that some sinkhole formation variability could be …
Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff
Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff
FIU Electronic Theses and Dissertations
The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.
Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …
Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek
Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek
Electronic Theses and Dissertations
ABSTRACT
Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models
By
Gregory Frank Malek
Stephen F. Austin State University, Masters in Statistics Program,
Nacogdoches, Texas, U.S.A.
This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …
Examining Cost Functionality And Optimization: A Case Study On Testing The Reasonableness Of New Aircraft Using Historical Aircraft Data, Katherine Jozefiak
Examining Cost Functionality And Optimization: A Case Study On Testing The Reasonableness Of New Aircraft Using Historical Aircraft Data, Katherine Jozefiak
Arts & Sciences Electronic Theses and Dissertations
When pursuing business by competing for government contracts, proving the submitted price is reasonable is often required. This proof is called a test of reasonableness. This study analyzes data from historical aircraft programs in relation of a new aircraft program in order to demonstrate the estimated cost of the new program is reasonable. The purpose of this study is to investigate three questions. Is the new program cost reasonable using current industry and government parameters? Is it better to look at programs from a total cost perspective or break the total cost into subcategory levels? Finally, this study applies a …
Hidden Trends In Nfl Data, Scott Santor
Hidden Trends In Nfl Data, Scott Santor
Statistics
This is an analysis on National Football League (NFL) data for the 2013-2014 regular season. The main goal is to find hidden trends in game data that can ultimately determine which factors are statistically significant to award a team with their ultimate objective, a win.
The main response variable to be examined is total wins throughout the regular season, and an alternative dependent variable is spread; the difference between a team’s points scored, and points against. Spread is analyzed to provide a different quantitative response variable that can be both positive and negative.
Game data was gathered from ESPN.com box …
A New Diagnostic Test For Regression, Yun Shi
A New Diagnostic Test For Regression, Yun Shi
Electronic Thesis and Dissertation Repository
A new diagnostic test for regression and generalized linear models is discussed. The test is based on testing if the residuals are close together in the linear space of one of the covariates are correlated. This is a generalization of the famous problem of spurious correlation in time series regression. A full model building approach for the case of regression was developed in Mahdi (2011, Ph.D. Thesis, Western University, ”Diagnostic Checking, Time Series and Regression”) using an iterative generalized least squares algorithm. Simulation experiments were reported that demonstrate the validity and utility of this approach but no actual applications were …
An Economic Alternative To The C Chart, Ryan William Black
An Economic Alternative To The C Chart, Ryan William Black
Graduate Theses and Dissertations
Because the probability of Type I error is not evenly distributed beyond upper and lower three-sigma limits the c chart is theoretically inappropriate for a monitor of Poisson distributed phenomena. Furthermore, the normal approximation to the Poisson is of little use when c is small. These practical and theoretical concerns should motivate the computation of true error rates associated with individuals control assuming the Poisson distribution. An economic alternative to the c chart is described as a statistical model of upward shift from c0 to c1 and the two charts are compared in theory. For a range of c chart …
The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang
The Em Algorithm For Group Testing Regression Models Under Matrix Pooling, Christopher R. Bilder, Boan Zhang
Department of Statistics: Faculty Publications
No abstract provided.
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …
To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little
To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little
The University of Michigan Department of Biostatistics Working Paper Series
Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …
Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe
Partial Auc Estimation And Regression, Lori E. Dodd, Margaret S. Pepe
UW Biostatistics Working Paper Series
Accurate disease diagnosis is critical for health care. New diagnostic and screening tests must be evaluated for their abilities to discriminate disease from non-diseased states. The partial area under the ROC curve (partial AUC) is a measure of diagnostic test accuracy. We present an interpretation of the partial AUC that gives rise to a new non-parametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modelling framework for making inference about covariate effects on the partial AUC. Such …
Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins
Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins
U.C. Berkeley Division of Biostatistics Working Paper Series
In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.
Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …