Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Entire DC Network

Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang Dec 2018

Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang

Graduate Theses and Dissertations

Regression analysis fits predictive models to data on a response variable and corresponding values for a set of explanatory variables. Often data on the explanatory variables come at a cost from commercial databases, so the available budget may limit which ones are used in the final model.

In this dissertation, two budget-constrained regression models are proposed for continuous and categorical variables respectively using Mixed Integer Nonlinear Programming (MINLP) to choose the explanatory variables to be included in solutions. First, we propose a budget-constrained linear regression model for continuous response variables. Properties such as solvability and global optimality of the proposed …


Identifying Key Factors Associated With High Risk Asthma Patients To Reduce The Cost Of Health Resources Utilization, Amani Ahmad Oct 2018

Identifying Key Factors Associated With High Risk Asthma Patients To Reduce The Cost Of Health Resources Utilization, Amani Ahmad

LSU Master's Theses

Asthma is associated with frequent use of primary health services and places a burden on the United States economy. Identifying key factors associated with increased cost of asthma is an essential step to improve practices of asthma management.

The aim of this study was to identify factors associated with over utilization of primary health services and increased cost via claims data and to explore the effectiveness of case management program in reducing overall asthma related cost.

Claims data analysis for Medicaid insured asthma patients in Louisiana was conducted. Asthma patients were identified using their ICD-9 and ICD-10 codes, forward variable …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman Jul 2018

Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman

SMU Data Science Review

In this paper, we present an analysis of flight data in order to determine whether the application of the Edge Aerodynamix Conformal Vortex Generator (CVG), applied to the wings of aircraft, reduces fuel flow during cruising conditions of flight. The CVG is a special treatment and film applied to the wings of an aircraft to protect the wings and reduce the non-laminar flow of air around the wings during flight. It is thought that by reducing the non-laminar flow or vortices around and directly behind the wings that an aircraft will move more smoothly through the air and provide a …


Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard May 2018

Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard

Electronic Theses and Dissertations

Sinkholes are a significant hazard for the southeastern United States. Although differences in climate are known to affect karst environments differently, quantitative analyses correlating sinkhole formation with climate variables is lacking. A temporal linear regression for Florida sinkholes and two modeled regressions for Tennessee sinkholes were produced: a general linearized logistic regression and a MaxEnt derived species distribution model. Temporal results showed highly significant correlations with precipitation, teleconnection patterns, temperature, and CO2, while spatial results showed highly significant correlations with precipitation, wind speed, solar radiation, and maximum temperature. Regression results indicated that some sinkhole formation variability could be …


Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff Mar 2018

Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff

FIU Electronic Theses and Dissertations

The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.

Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …


Semiparametric Regression In The Presence Of Measurement Error, Xiang Li Jan 2018

Semiparametric Regression In The Presence Of Measurement Error, Xiang Li

Theses and Dissertations

The error-in-covariates problem has received great attention among researchers who study semiparametric and nonparametric inference for regression models over the past two decades. Without correcting for the measurement error in covariates, estimators for covariate effect usually contain bias. To account for measurement error, much research have been done in mean regression (Liang et al., 1999; Fuller, 2009; Carroll et al., 2006) and quantile regression (He and Liang, 2000; Hardle et al., 2000; Wei and Carroll, 2009). In contrast, there is little research in mode regression and this motivates us to propose semiparametric methods to address this error-incovariates problem in Chapters …


Comparison Of The Performance Of Simple Linear Regression And Quantile Regression With Non-Normal Data: A Simulation Study, Marjorie Howard Jan 2018

Comparison Of The Performance Of Simple Linear Regression And Quantile Regression With Non-Normal Data: A Simulation Study, Marjorie Howard

Theses and Dissertations

Linear regression is a widely used method for analysis that is well understood across a wide variety of disciplines. In order to use linear regression, a number of assumptions must be met. These assumptions, specifically normality and homoscedasticity of the error distribution can at best be met only approximately with real data. Quantile regression requires fewer assumptions, which offers a potential advantage over linear regression. In this simulation study, we compare the performance of linear (least squares) regression to quantile regression when these assumptions are violated, in order to investigate under what conditions quantile regression becomes the more advantageous method …