Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Engineering

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen Dec 2019

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen

SMU Data Science Review

This paper presents a comparative study on machine learning methods as they are applied to product associations, future purchase predictions, and predictions of customer churn in aftermarket operations. Association rules are used help to identify patterns across products and find correlations in customer purchase behaviour. Studying customer behaviour as it pertains to Recency, Frequency, and Monetary Value (RFM) helps inform customer segmentation and identifies customers with propensity to churn. Lastly, Flowserve’s customer purchase history enables the establishment of churn thresholds for each customer group and assists in constructing a model to predict future churners. The aim of this model is …


Predicting Wind Turbine Blade Erosion Using Machine Learning, Casey Martinez, Festus Asare Yeboah, Scott Herford, Matt Brzezinski, Viswanath Puttagunta Aug 2019

Predicting Wind Turbine Blade Erosion Using Machine Learning, Casey Martinez, Festus Asare Yeboah, Scott Herford, Matt Brzezinski, Viswanath Puttagunta

SMU Data Science Review

Using time-series data and turbine blade inspection assessments, we present a classification model in order to predict remaining turbine blade life in wind turbines. Capturing the kinetic energy of wind requires complex mechanical systems, which require sophisticated maintenance and planning strategies. There are many traditional approaches to monitoring the internal gearbox and generator, but the condition of turbine blades can be difficult to measure and access. Accurate and cost- effective estimates of turbine blade life cycles will drive optimal investments in repairs and improve overall performance. These measures will drive down costs as well as provide cheap and clean electricity …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Machine Learning To Predict The Likelihood Of A Personal Computer To Be Infected With Malware, Maryam Shahini, Ramin Farhanian, Marcus Ellis Aug 2019

Machine Learning To Predict The Likelihood Of A Personal Computer To Be Infected With Malware, Maryam Shahini, Ramin Farhanian, Marcus Ellis

SMU Data Science Review

In this paper, we present a new model to predict the prob- ability that a personal computer will become infected with malware. The dataset is selected from a Kaggle competition supported by Mi- crosoft. The data includes computer configuration, owner information, installed software, and configuration information. In our research, sev- eral classification models are utilized to assign a probability of a machine being infected with malware. The LightGBM classifier is the optimum machine learning model by performing faster with higher efficiency and lower memory usage in this research. The LightGBM algorithm obtained a cross-validation ROC-AUC score of 74%. Leading factors …


Aws Ec2 Instance Spot Price Forecasting Using Lstm Networks, Jeffrey Lancon, Yejur Kunwar, David Stroud, Monnie Mcgee, Robert Slater Aug 2019

Aws Ec2 Instance Spot Price Forecasting Using Lstm Networks, Jeffrey Lancon, Yejur Kunwar, David Stroud, Monnie Mcgee, Robert Slater

SMU Data Science Review

Cloud computing is a network of remote computing resources hosted on the Internet that allow users to utilize cloud resources on demand. As such, it represents a paradigm shift in the way businesses and industries think about digital infrastructure. With the shift from IT resources being a capital expenditure to a managed service, companies must rethink how they approach utilizing and optimizing these resources in order to maximize productivity and minimize costs. With proper resource management, cloud resources can be instrumental in reducing computing expenses.

Cloud resources are perishable commodities; therefore, cloud service providers have developed strategies to maximize utilization …


Visualizing United States Energy Production Data, Bruce P. Kimbark, Melissa Luzardo, Charles South, James Taber Aug 2019

Visualizing United States Energy Production Data, Bruce P. Kimbark, Melissa Luzardo, Charles South, James Taber

SMU Data Science Review

Power plants production, load, financials and environmental impact from power plants in the United States is publicly available either from the Energy Information Administration, the Environmental Protection Agency or Lazard among others. The general public is interested in US energy production and its potential environmental impact but the available information is complex and difficult to properly understand and not shared in ways that are accessible. Our objective was to gather this data and create different interactive visualizations that make it consumable. Each of the five visualization was designed to explain a specific part of energy that together can provide a …


Improve Image Classification Using Data Augmentation And Neural Networks, Shanqing Gu, Manisha Pednekar, Robert Slater Aug 2019

Improve Image Classification Using Data Augmentation And Neural Networks, Shanqing Gu, Manisha Pednekar, Robert Slater

SMU Data Science Review

In this paper, we present how to improve image classification by using data augmentation and convolutional neural networks. Model overfitting and poor performance are common problems in applying neural network techniques. Approaches to bring intra-class differences down and retain sensitivity to the inter-class variations are important to maximize model accuracy and minimize the loss function. With CIFAR-10 public image dataset, the effects of model overfitting were monitored within different model architectures in combination of data augmentation and hyper-parameter tuning. The model performance was evaluated with train and test accuracy and loss, characteristics derived from the confusion matrices, and visualizations of …


Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater May 2019

Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater

SMU Data Science Review

Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection. A number of successful object detection systems have been proposed in recent years that are based on CNNs. In this paper, an empirical evaluation of three recent meta-architectures: SSD (Single Shot multi-box Detector), R-CNN (Region-based CNN) and R-FCN (Region-based Fully Convolutional Networks) was conducted to measure how fast and accurate they are in identifying objects on the road, such as vehicles, pedestrians, …


Analysis Of Computer Audit Data To Create Indicators Of Compromise For Intrusion Detection, Steven Millett, Michael Toolin, Justin Bates May 2019

Analysis Of Computer Audit Data To Create Indicators Of Compromise For Intrusion Detection, Steven Millett, Michael Toolin, Justin Bates

SMU Data Science Review

Network security systems are designed to identify and, if possible, prevent unauthorized access to computer and network resources. Today most network security systems consist of hardware and software components that work in conjunction with one another to present a layered line of defense against unauthorized intrusions. Software provides user interactive layers such as password authentication, and system level layers for monitoring network activity. This paper examines an application monitoring network traffic that attempts to identify Indicators of Compromise (IOC) by extracting patterns in the network traffic which likely corresponds to unauthorized access. Typical network log data and construct indicators are …


Network Traffic Behavioral Analytics For Detection Of Ddos Attacks, Alma D. Lopez, Asha P. Mohan, Sukumaran Nair May 2019

Network Traffic Behavioral Analytics For Detection Of Ddos Attacks, Alma D. Lopez, Asha P. Mohan, Sukumaran Nair

SMU Data Science Review

As more organizations and businesses in different sectors are moving to a digital transformation, there is a steady increase in malware, facing data theft or service interruptions caused by cyberattacks on network or application that impact their customer experience. Bot and Distributed Denial of Service (DDoS) attacks consistently challenge every industry relying on the internet. In this paper, we focus on Machine Learning techniques to detect DDoS attack in network communication flows using continuous learning algorithm that learns the normal pattern of network traffic, behavior of the network protocols and identify a compromised network flow. Detection of DDoS attack will …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


Finding Truth In Fake News: Reverse Plagiarism And Other Models Of Classification, Matthew Przybyla, David Tran, Amber Whelpley, Daniel W. Engels Jan 2019

Finding Truth In Fake News: Reverse Plagiarism And Other Models Of Classification, Matthew Przybyla, David Tran, Amber Whelpley, Daniel W. Engels

SMU Data Science Review

As the digital age creates new ways of spreading news, fake stories are propagated to widen audiences. A majority of people obtain both fake and truthful news without knowing which is which. There is not currently a reliable and efficient method to identify “fake news”. Several ways of detecting fake news have been produced, but the various algorithms have low accuracy of detection and the definition of what makes a news item ‘fake’ remains unclear. In this paper, we propose a new method of detecting on of fake news through comparison to other news items on the same topic, as …


Framework For Evaluation Of Flash Flood Models In Wildfire-Prone Areas, Brian Cunningham, David Benepe, Bryan Cikatz, Evangelos Giakoumakis Jan 2019

Framework For Evaluation Of Flash Flood Models In Wildfire-Prone Areas, Brian Cunningham, David Benepe, Bryan Cikatz, Evangelos Giakoumakis

SMU Data Science Review

Abstract. In this paper, we present an innovative framework for evaluating the increased risk of flash flooding in areas that have been subjected to wildfires. Wildfires cause large-scale damage to an area’s soil and vegetation thus increasing both the likelihood and severity of flash flooding. Utilizing remote sensing to analyze aerial imagery of areas that have been affected by wildfires, we can investigate how much a landscape has changed and how that may adversely affect downstream areas in the event of a flash flooding event. There are currently no established frameworks from which downstream local officials can quickly assess the …


Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas Jan 2019

Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas

SMU Data Science Review

In this paper, we present a method to identify urban areas with a higher likelihood of pedestrian safety related events. Pedestrian safety related events are pedestrian-vehicle interactions that result in fatalities, injuries, accidents without injury, or near--misses between pedestrians and vehicles. To develop a solution to this problem of identifying likely event locations, we assemble data, primarily from the City of Cincinnati and Hamilton County, that include safety reports from a five year period, geographic information for these events, citizen survey of pedestrian reported concerns, non-emergency requests for service for any cause in the city, property values and public transportation …


Comparative Study Of Sentiment Analysis With Product Reviews Using Machine Learning And Lexicon-Based Approaches, Heidi Nguyen, Aravind Veluchamy, Mamadou Diop, Rashed Iqbal Jan 2019

Comparative Study Of Sentiment Analysis With Product Reviews Using Machine Learning And Lexicon-Based Approaches, Heidi Nguyen, Aravind Veluchamy, Mamadou Diop, Rashed Iqbal

SMU Data Science Review

In this paper, we present a comparative study of text sentiment classification models using term frequency inverse document frequency vectorization in both supervised machine learning and lexicon-based techniques. There have been multiple promising machine learning and lexicon-based techniques, but the relative goodness of each approach on specific types of problems is not well understood. In order to offer researchers comprehensive insights, we compare a total of six algorithms to each other. The three machine learning algorithms are: Logistic Regression (LR), Support Vector Machine (SVM), and Gradient Boosting. The three lexicon-based algorithms are: Valence Aware Dictionary and Sentiment Reasoner (VADER), Pattern, …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


Project Insight: A Granular Approach To Enterprise Cybersecurity, Sunna Quazi, Adam Baca, Sam Darsche Jan 2019

Project Insight: A Granular Approach To Enterprise Cybersecurity, Sunna Quazi, Adam Baca, Sam Darsche

SMU Data Science Review

In this paper, we disambiguate risky activity corporate users are propagating with their software in real time by creating an enterprise security visualization solution for system administrators. The current problem in this domain is the lag in cyber intelligence that inhibits preventative security measure execution. This is partially due to the overemphasis of network activity, which is a nonfinite dataset and is difficult to comprehensively ingest with analytics. We address these concerns by elaborating on the beta of a software called "Insight" created by Felix Security. The overall solution leverages endpoint data along with preexisting whitelist/blacklist designations to unambiguously communicate …


Improving Gas Well Economics With Intelligent Plunger Lift Optimization Techniques, Atsu Atakpa, Emmanuel Farrugia, Ryan Tyree, Daniel W. Engels, Charles Sparks Jan 2019

Improving Gas Well Economics With Intelligent Plunger Lift Optimization Techniques, Atsu Atakpa, Emmanuel Farrugia, Ryan Tyree, Daniel W. Engels, Charles Sparks

SMU Data Science Review

In this paper, we present an approach to reducing bottom hole plunger dwell time for artificial lift systems. Lift systems are used in a process to remove contaminants from a natural gas well. A plunger is a mechanical device used to deliquefy natural gas wells by removing contaminants in the form of water, oil, wax, and sand from the wellbore. These contaminants decrease bottom-hole pressure which in turn hampers gas production by forming a physical barrier within the well tubing. As the plunger descends through the well it emits sounds which are recorded at the surface by an echo-meter that …