Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus
SDSU Data Science Symposium
Decision trees are a method commonly used in machine learning to either predict a categorical response or a continuous response variable. Once the tree partitions the space, the response is either determined by the majority vote – classification trees, or by averaging the response values – regression trees. This research builds a standard regression tree and then instead of averaging the responses, we train a neural network to determine the response value. We have found that our approach typically increases the predicative capability of the decision tree. We have 2 demonstrations of this approach that we wish to present as a poster ...
Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson
SDSU Data Science Symposium
Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, HDL ...
Level Crossing Simulation Of A Queueing Model, 2019 University of Windsor
Level Crossing Simulation Of A Queueing Model, Zhanxuan Ding
Simulation of the level crossing method will be used to find approximations of the distribution of the workload for several queueing models. In particular, three different type of queueing models, with different methods of handling workload bound thresholds, will be considered. Simulation applied to workload bound thresholds is new work.
An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, 2019 Southern Methodist University
An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine
SMU Data Science Review
In this paper, we present an evaluation of training size impact on validation accuracy for an optimized Convolutional Neural Network (CNN). CNNs are currently the state-of-the-art architecture for object classification tasks. We used Amazon’s machine learning ecosystem to train and test 648 models to find the optimal hyperparameters with which to apply a CNN towards the Fashion-MNIST (Mixed National Institute of Standards and Technology) dataset. We were able to realize a validation accuracy of 90% by using only 40% of the original data. We found that hidden layers appear to have had zero impact on validation accuracy, whereas the ...
Political Profiling Using Feature Engineering And Nlp, 2019 Southern Methodist University
Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt
SMU Data Science Review
Public surveys are predominantly used when forecasting election outcomes. While the approach has had significant successes, the surveys have had their failures as well, especially when it comes to accuracy and reliability. As a result, it becomes challenging for political parties to spend their campaign budgets in a manner that facilitates the growth of a favorable and verifiable public opinion. Consequently, it is critical that a more accurate methodology to predict election outcome is developed. In this paper, we present an evaluation of the impact of utilizing dynamic public data on predicting the outcome of elections. Our model yielded a ...
Pedestrian Safety -- Fundamental To A Walkable City, 2019 Southern Methodist University
Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas
SMU Data Science Review
In this paper, we present a method to identify urban areas with a higher likelihood of pedestrian safety related events. Pedestrian safety related events are pedestrian-vehicle interactions that result in fatalities, injuries, accidents without injury, or near--misses between pedestrians and vehicles. To develop a solution to this problem of identifying likely event locations, we assemble data, primarily from the City of Cincinnati and Hamilton County, that include safety reports from a five year period, geographic information for these events, citizen survey of pedestrian reported concerns, non-emergency requests for service for any cause in the city, property values and public transportation ...
Improving Vix Futures Forecasts Using Machine Learning Methods, 2019 Southern Methodist University
Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater
SMU Data Science Review
The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide ...
Collaborative Efforts To Forecast Seasonal Influenza In The United States, 2015–2016, 2019 Centers for Disease Control and Prevention
Collaborative Efforts To Forecast Seasonal Influenza In The United States, 2015–2016, Craig J. Mcgowan, Jarad Niemi, Nehemias Ulloa, Katie Will, Et Al.
Since 2013, the Centers for Disease Control and Prevention (CDC) has hosted an annual influenza season forecasting challenge. The 2015–2016 challenge consisted of weekly probabilistic forecasts of multiple targets, including fourteen models submitted by eleven teams. Forecast skill was evaluated using a modified logarithmic score. We averaged submitted forecasts into a mean ensemble model and compared them against predictions based on historical trends. Forecast skill was highest for seasonal peak intensity and short-term forecasts, while forecast skill for timing of season onset and peak week was generally low. Higher forecast skill was associated with team participation in previous influenza ...
Rfviz: An Interactive Visualization Package For Random Forests In R, 2018 Utah State University
Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett
All Graduate Plan B and other Reports
Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create ...
An Analysis Of Classroom Collusion Using Latent Dirichlet Allocation, 2018 Iowa State University
An Analysis Of Classroom Collusion Using Latent Dirichlet Allocation, Charles B. Shrader, Sue P. Ravenscroft, Jeffrey Kaufmann
Management Conference Papers, Posters and Proceedings
In this study, we use Latent Dirichlet Allocation to explore the reflections of students who faced a demanding classroom challenge, to which some responded by colluding. Our five-topic LDA solution describes the cheating event in terms of the nature of the course assignment itself, teams as a resource and support mechanism, the repercussions of cheating, and differences between majors or course tracks. The most relevant topics were the differences between the tracks and the repercussions of cheating. Teams and teammates also play a large role in the students’ reflections. We conclude with the implications of these topics in future research.
Bias Assessment And Reduction In Kernel Smoothing, 2018 The University of Western Ontario
Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma
Electronic Thesis and Dissertation Repository
When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a ...
41 - Data Exploration And Analysis For The Hemingway Measure Of Adult Connectedness, 2018 University of North Georgia
41 - Data Exploration And Analysis For The Hemingway Measure Of Adult Connectedness, Gildardo Bautista-Maya, Ping Ye, Diane Cook
Georgia Undergraduate Research Conference (GURC)
We analyze the dataset collected from students participating in the Boy With A Ball (BWAB) program, a faith-based community outreach group, through the Hemingway Measure of Adult Connectedness©, a questionnaire measuring the social connectedness of adolescents. First, we approach the data in the conventional method provided by the Hemingway website. We then identify which questions are strong determiners in deciding whether a student has completed the BWAB program or not. With the goal of utilizing the logistic regression, we reduce the set of questions to those only identified as significant in other methods. These methods include linear regression, decision ...
Statistical Investigation Of Road And Railway Hazardous Materials Transportation Safety, 2018 University of Nebraska-Lincoln
Statistical Investigation Of Road And Railway Hazardous Materials Transportation Safety, Amirfarrokh Iranitalab
Civil Engineering Theses, Dissertations, and Student Research
Transportation of hazardous materials (hazmat) in the United States (U.S.) constituted 22.8% of the total tonnage transported in 2012 with an estimated value of more than 2.3 billion dollars. As such, hazmat transportation is a significant economic activity in the U.S. However, hazmat transportation exposes people and environment to the infrequent but potentially severe consequences of incidents resulting in hazmat release. Trucks and trains carried 63.7% of the hazmat in the U.S. in 2012 and are the major foci of this dissertation. The main research objectives were 1) identification and quantification of the effects ...
Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, 2018 University of Windsor
Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song
In this major paper, we study the influence of structural breaks in the financial market model with high-dimensional data. We present a model which is capable of detecting changes in factor loadings, determining the number of factors and detecting the break date. We consider the case where the break date is both known and unknown and identify the type of instability. For the unknown break date case, we propose a group-LASSO estimator to determine the number of pre- and post-break factors, the break date and the existence of instability of factor loadings when the number of factor is constant. We ...
Estimation In High-Dimensional Factor Models With Structural Instabilities, 2018 University of Windsor
Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao
In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.
Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors ...
Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, 2018 Illinois State University
Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, Carlos Cruz
Annual Symposium on Biomathematics and Ecology: Education and Research
No abstract provided.
Use Of Structural Equation Models To Predict Dengue Illness Phenotype, 2018 Brown University
Use Of Structural Equation Models To Predict Dengue Illness Phenotype, Sangshin Park, Anon Srikiatkhachorn, Siripen Kalayanarooj, Louis Macareo, Sharone Green, Jennifer F. Friedman, Alan L. Rothman
Open Access Articles
BACKGROUND: Early recognition of dengue, particularly patients at risk for plasma leakage, is important to clinical management. The objective of this study was to build predictive models for dengue, dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS) using structural equation modelling (SEM), a statistical method that evaluates mechanistic pathways.
METHODS/FINDINGS: We performed SEM using data from 257 Thai children enrolled within 72 h of febrile illness onset, 156 with dengue and 101 with non-dengue febrile illnesses. Models for dengue, DHF, and DSS were developed based on data obtained three and one day(s) prior to fever resolution (fever ...
Identifying Treatment Effects In The Presence Of Confounded Types, 2018 Iowa State University
Identifying Treatment Effects In The Presence Of Confounded Types, Desire Kedagni
Economics Working Papers
In this paper, I consider identification of treatment effects when
the treatment is endogenous. The use of instrumental variables is a popular
solution to deal with endogeneity, but this may give misleading answers when
the instrument is invalid. I show that when the instrument is invalid due to
correlation with the first stage unobserved heterogeneity, a second (also
possibly invalid) instrument allows to partially identify not only the local
average treatment effect but also the entire potential outcomes distributions
for compliers. I exploit the fact that the distribution of the observed
outcome in each group defined by the treatment and ...
Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, 2018 Southern Methodist University
Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl
SMU Data Science Review
In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from ...
Minimizing The Perceived Financial Burden Due To Cancer, 2018 Southern Methodist University
Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John
SMU Data Science Review
In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...