Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 35

Full-Text Articles in Physical Sciences and Mathematics

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen Dec 2019

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen

SMU Data Science Review

This paper presents a comparative study on machine learning methods as they are applied to product associations, future purchase predictions, and predictions of customer churn in aftermarket operations. Association rules are used help to identify patterns across products and find correlations in customer purchase behaviour. Studying customer behaviour as it pertains to Recency, Frequency, and Monetary Value (RFM) helps inform customer segmentation and identifies customers with propensity to churn. Lastly, Flowserve’s customer purchase history enables the establishment of churn thresholds for each customer group and assists in constructing a model to predict future churners. The aim of this model is …


Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas Dec 2019

Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas

SMU Data Science Review

In the age of hyper-connectivity, 24/7 news cycles, and instant news alerts via social media, mental health researchers don't have a way to automatically detect news content which is associated with triggering anxiety or depression in mental health patients. Using the Associated Press news wire, a semantic network was built with 1,056 news articles containing over 500,000 connections across multiple topics to provide a personalized algorithm which detects problematic news content for a given reader. We make use of Semantic Network Analysis to surface the relationship between news article text and anxiety in readers who struggle with mental health disorders. …


A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater Dec 2019

A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater

SMU Data Science Review

In this paper, we present a common definition and list of skills for a Data Scientist using online job postings. The overlap and ambiguity of various roles such as data scientist, data engineer, data analyst, software engineer, database administrator, and statistician motivate the problem. To arrive at a single Data Scientist definition, we collect over 8,000 job postings from Indeed.com for the six job titles. Each corpus contains text on job qualifications, skills, responsibilities, educational preferences, and requirements. Our data science methodology and analysis rendered the single definition of a data scientist: A data scientist codes, collaborates, and communicates – …


Achieving Optimal Horizontal Drill Operations, Daniel J. Serna, James Vasquez, Donald Markley Dec 2019

Achieving Optimal Horizontal Drill Operations, Daniel J. Serna, James Vasquez, Donald Markley

SMU Data Science Review

In this paper, we present a novel method of predicting the onset of a slide event in horizontal drilling operations. Horizontal drilling operations attempt to create a well through a subsurface as quickly as possible by rotating a drill through the subsurface. A slide event occurs when the drill begins to inefficiently rotate through the subsurface, resulting in a significantly reduced rate of penetration. Slide events can be prevented, or significantly reduced in their impact, when their onset is accurately predicted. We present a method of accurately predicting the onset of slide events with a time-series based predictive model that …


A Data Driven Approach To Forecast Demand, Hannah Kosinovsky, Sita Daggubati, Kumar Ramasundaram, Brent Allen Dec 2019

A Data Driven Approach To Forecast Demand, Hannah Kosinovsky, Sita Daggubati, Kumar Ramasundaram, Brent Allen

SMU Data Science Review

Abstract. In this paper, we present a model and methodology for accurately predicting the following quarter’s sales volume of individual products given the previous five years of sales data. Forecasting product demand for a single supplier is complicated by seasonal demand variation, business cycle impacts, and customer churn. We developed a novel prediction using machine learning methodology, based upon a Dense neural network (DNN) model that implicitly considers cyclical demand variation and explicitly considers customer churn while minimizing the least absolute error between predicted demand and actual sales. Using parts sales data for a supplier to the oil and gas …


A Machine Learning Model For Clustering Securities, Vanessa Torres, Travis Deason, Michael Landrum, Nibhrat Lohria Aug 2019

A Machine Learning Model For Clustering Securities, Vanessa Torres, Travis Deason, Michael Landrum, Nibhrat Lohria

SMU Data Science Review

In this paper, we evaluate the self-declared industry classifications and industry relationships between companies listed on either the Nasdaq or the New York Stock Exchange (NYSE) markets. Large corporations typically operate in multiple industries simultaneously; however, for investment purposes they are classified as belonging to a single industry. This simple classification obscures the actual industries within which a company operates, and, therefore, the investment risks of that company.
By using Natural Language Processing (NLP) techniques on Security and Exchange Commission (SEC) filings, we obtained self-defined industry classifications per company. Using clustering techniques such as Hierarchical Agglomerative and k-means clustering we …


Predicting Wind Turbine Blade Erosion Using Machine Learning, Casey Martinez, Festus Asare Yeboah, Scott Herford, Matt Brzezinski, Viswanath Puttagunta Aug 2019

Predicting Wind Turbine Blade Erosion Using Machine Learning, Casey Martinez, Festus Asare Yeboah, Scott Herford, Matt Brzezinski, Viswanath Puttagunta

SMU Data Science Review

Using time-series data and turbine blade inspection assessments, we present a classification model in order to predict remaining turbine blade life in wind turbines. Capturing the kinetic energy of wind requires complex mechanical systems, which require sophisticated maintenance and planning strategies. There are many traditional approaches to monitoring the internal gearbox and generator, but the condition of turbine blades can be difficult to measure and access. Accurate and cost- effective estimates of turbine blade life cycles will drive optimal investments in repairs and improve overall performance. These measures will drive down costs as well as provide cheap and clean electricity …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Identifying Undervalued Players In Fantasy Football, Christopher D. Morgan, Caroll Rodriguez, Korey Macvittie, Robert Slater, Daniel W. Engels Aug 2019

Identifying Undervalued Players In Fantasy Football, Christopher D. Morgan, Caroll Rodriguez, Korey Macvittie, Robert Slater, Daniel W. Engels

SMU Data Science Review

In this paper we present a model to predict player performance in fantasy football. In particular, identifying high-performance players can prove to be a difficult problem, as there are on occasion players capable of high performance whose past metrics give no indication of this capacity. These "sleepers"' are often undervalued, and the acquisition of such players can have notable impact on a fantasy football team's overall performance. We constructed a regression model that accounts for players' past performance and athletic metrics to predict their future performance. The model we built performs favorably in predicting athlete performance in relation to other …


Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels Aug 2019

Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels

SMU Data Science Review

In this paper we find a pattern of aperiodic seismic signals that precede earthquakes at any time in a laboratory earthquake’s cycle using a small window of time. We use a data set that comes from a classic laboratory experiment having several stick-slip displacements (earthquakes), a type of experiment which has been studied as a simulation of seismologic faults for decades. This data exhibits similar behavior to natural earthquakes, so the same approach may work in predicting the timing of them. Here we show that by applying random forest machine learning technique to the acoustic signal emitted by a laboratory …


Longitudinal Analysis With Modes Of Operation For Aes, Dana Geislinger, Cory Thigpen, Daniel W. Engels Aug 2019

Longitudinal Analysis With Modes Of Operation For Aes, Dana Geislinger, Cory Thigpen, Daniel W. Engels

SMU Data Science Review

In this paper, we present an empirical evaluation of the randomness of the ciphertext blocks generated by the Advanced Encryption Standard (AES) cipher in Counter (CTR) mode and in Cipher Block Chaining (CBC) mode. Vulnerabilities have been found in the AES cipher that may lead to a reduction in the randomness of the generated ciphertext blocks that can result in a practical attack on the cipher. We evaluate the randomness of the AES ciphertext using the standard key length and NIST randomness tests. We evaluate the randomness through a longitudinal analysis on 200 billion ciphertext blocks using logistic regression and …


Pristine Sentence Translation: A New Approach To A Timeless Problem, Meenu Ahluwalia, Brian Coari, Ben Brock Aug 2019

Pristine Sentence Translation: A New Approach To A Timeless Problem, Meenu Ahluwalia, Brian Coari, Ben Brock

SMU Data Science Review

Abstract.

Pristine Sentence Translation (PST) is a new approach to language translation based upon sentence-level granularity. Traditional translation approaches, including those utilizing advanced machine learning or neural network-based approaches, translate on a word-by-word or phrase-by-phrase basis; thereby, potentially missing the context or meaning of the complete sentence. Instead of these piecewise translations, PST utilizes deep learning and predictive modeling techniques to translate complete sentences from their source language into their target language. With these approaches we were able to translate sentences that closely conveyed the meaning of the original sentences. Our results demonstrated that PST’s method of translating an entire …


Forecasting Localized Weather-Based Photovoltaic Energy Production, Kevin Chang, Afreen Siddiqui, Robert Slater Aug 2019

Forecasting Localized Weather-Based Photovoltaic Energy Production, Kevin Chang, Afreen Siddiqui, Robert Slater

SMU Data Science Review

Photovoltaic (PV) power system performance can vary from nominal specifications when put in application, making it difficult to accurately estimate real power generation at a localized level. As the usage and efficiency of PV systems has increased in recent years, the amount of power contributed to the national power grid from solar irradiation has also increased significantly. However, solar power installations are subject to variances in efficiency and output, driven by differences in system size, local weather, and atmospheric condition changes. With a significant install base in today's world, combined with extensive solar irradiance and meteorological data, the variables exist …


Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater May 2019

Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater

SMU Data Science Review

Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection. A number of successful object detection systems have been proposed in recent years that are based on CNNs. In this paper, an empirical evaluation of three recent meta-architectures: SSD (Single Shot multi-box Detector), R-CNN (Region-based CNN) and R-FCN (Region-based Fully Convolutional Networks) was conducted to measure how fast and accurate they are in identifying objects on the road, such as vehicles, pedestrians, …


Asl Reverse Dictionary - Asl Translation Using Deep Learning, Ann Nelson, Kj Price, Rosalie Multari May 2019

Asl Reverse Dictionary - Asl Translation Using Deep Learning, Ann Nelson, Kj Price, Rosalie Multari

SMU Data Science Review

The challenges of learning a new language can be reduced with real-time feedback on pronunciation and language usage. Today there are readily available technologies which provide such feedback on spoken languages, by translating the voice of the learner into written text. For someone seeking to learn American Sign Language (ASL), there is however no such feedback application available. A learner of American Sign Language might reference websites or books to obtain an image of a hand sign for a word. This process is like looking up a word in a dictionary, and if the person wanted to know if they …


Identification And Classification Of Poultry Eggs: A Case Study Utilizing Computer Vision And Machine Learning, Jeremy Lubich, Kyle Thomas, Daniel W. Engels May 2019

Identification And Classification Of Poultry Eggs: A Case Study Utilizing Computer Vision And Machine Learning, Jeremy Lubich, Kyle Thomas, Daniel W. Engels

SMU Data Science Review

We developed a method to identify, count, and classify chickens and eggs inside nesting boxes of a chicken coop. Utilizing an IoT AWS Deep Lens Camera for data capture and inferences, we trained and deployed a custom single-shot multibox (SSD) object detection and classification model. This allows us to monitor a complex environment with multiple chickens and eggs moving and appearing simultaneously within the video frames. The models can label video frames with classifications for 8 breeds of chickens and/or 4 colors of eggs, with 98% accuracy on chickens or eggs alone and 82.5% accuracy while detecting both types of …


Demand Forecasting: An Open-Source Approach, Murtada Shubbar, Jared Smith May 2019

Demand Forecasting: An Open-Source Approach, Murtada Shubbar, Jared Smith

SMU Data Science Review

In this paper, we compare demand forecasting methods used by the supply chain department at Bilports to open-source forecasting methods. The design and implementation of the open-source forecasting system also attempts to use several external datasets such as consumer sentiment, housing permit starts, and weather to improve prediction quality. Additionally, the performance of the forecast is evaluated by the reduction of shipment lead times from China, the company’s primary vendor. The objective of our paper is to improve Bilports’s forecasting capabilities. The primary motivation of this paper is to increase forecasting accuracy and identify the weaknesses of the methods used …


Analysis Of Computer Audit Data To Create Indicators Of Compromise For Intrusion Detection, Steven Millett, Michael Toolin, Justin Bates May 2019

Analysis Of Computer Audit Data To Create Indicators Of Compromise For Intrusion Detection, Steven Millett, Michael Toolin, Justin Bates

SMU Data Science Review

Network security systems are designed to identify and, if possible, prevent unauthorized access to computer and network resources. Today most network security systems consist of hardware and software components that work in conjunction with one another to present a layered line of defense against unauthorized intrusions. Software provides user interactive layers such as password authentication, and system level layers for monitoring network activity. This paper examines an application monitoring network traffic that attempts to identify Indicators of Compromise (IOC) by extracting patterns in the network traffic which likely corresponds to unauthorized access. Typical network log data and construct indicators are …


Kadafrica: Survey Analysis To Support Research For Smallholder Farmers, Gregory Asamoah, Robert Gill, Frank Sclafani, Bivin Sadler May 2019

Kadafrica: Survey Analysis To Support Research For Smallholder Farmers, Gregory Asamoah, Robert Gill, Frank Sclafani, Bivin Sadler

SMU Data Science Review

In this paper, we present an analysis of survey data with the goal of determining if the KadAfrica training program, a social organization in Uganda, has a significant effect on the lives of the girls who participate in the program. This is done through an observational study of girl’s responses to several pre-program and post-program questions. These questions include topics such as the girl’s access to hygiene materials and their personal views on family finances. In addition to providing an analysis of historical data, we established a data platform in which future data can be stored and analyzed in an …


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Repairing Landsat Satellite Imagery Using Deep Machine Learning Techniques, Griffin J. Lane, Patricia Goresen, Robert Slater May 2019

Repairing Landsat Satellite Imagery Using Deep Machine Learning Techniques, Griffin J. Lane, Patricia Goresen, Robert Slater

SMU Data Science Review

Satellite Imagery is one of the most widely used sources to analyze geographic features and environments in the world. The data gathered from satellites are used to quantify many vital problems facing our society, such as the impact of natural disasters, shore erosion, rising water levels, and urban growth rates. In this paper, we construct machine learning and deep learning algorithms for repairing anomalies in the Landsat satellite imagery data which arise for various reasons ranging from cloud obstruction to satellite malfunctions. The accuracy of GIS data is crucial to ensuring the models produced from such data are as close …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi May 2019

Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi

SMU Data Science Review

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …


Automate Nuclei Detection Using Neural Networks, Jonathan Flores, Thejas Prasad, Jordan Kassof, Robert Slater May 2019

Automate Nuclei Detection Using Neural Networks, Jonathan Flores, Thejas Prasad, Jordan Kassof, Robert Slater

SMU Data Science Review

Nuclei identification is a pivotal first step in many areas of biomedical research. Pathologists often observe images containing microscopic nuclei as part of their day to day jobs. During research, pathologists must identify nuclei characteristics from microscopic images such as: volume of nuclei, size, density and individual position within image. The pathology field can benefit from image detection enhancements done through the use of computer image segmentation techniques. This research presents methods that can be used to identify all the cell nuclei contained in images. Multiple techniques were experimented with such as edge detection and Convolutional Neural Networks with U-Net …


Machine Learning Vs Conventional Analysis Techniques For The Earth’S Magnetic Field Study, Sheri Loftin, Sarah J. Fite, Laura V. Bishop, Stavros Kotsiaros May 2019

Machine Learning Vs Conventional Analysis Techniques For The Earth’S Magnetic Field Study, Sheri Loftin, Sarah J. Fite, Laura V. Bishop, Stavros Kotsiaros

SMU Data Science Review

Abstract. Current techniques for calculating and generating models used for analyzing the Earth’s magnetic field are laborious and time-consuming. We assert that machine learning can have a significant impact on building magnetic field models more quickly and on various levels of complexity, specifically as it pertains to data cleansing and sorting. Our approach to this problem uses a reverse iterative multi-phase process for data cleansing, in which, initially, the CHAOS-6 model data is examined to determine if machine learning can be used to differentiate between useful data components for spherical harmonics, versus data noise. During this phase, six different machine …


Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock May 2019

Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock

SMU Data Science Review

Through microblogging applications, such as Twitter, people actively document their lives even in times of natural disasters such as hurricanes and earthquakes. While first responders and crisis-teams are able to help people who call 911, or arrive at a designated shelter, there are vast amounts of information being exchanged online via Twitter that provide real-time, location-based alerts that are going unnoticed. To effectively use this information, the Tweets must be verified for authenticity and categorized to ensure that the proper authorities can be alerted. In this paper, we create a Crisis Message Corpus from geotagged Tweets occurring during 7 hurricanes …


Tidying And Analysis Of The 2014 Texas English Ii End-Of-Course Exam, David Churchman, Abigail Morton Garland May 2019

Tidying And Analysis Of The 2014 Texas English Ii End-Of-Course Exam, David Churchman, Abigail Morton Garland

SMU Data Science Review

The state of Texas requires all public high school students to take End of Course (EOC) exams. The results of these exams are made nominally public, but in a shape and format that precludes ready analysis. To the extent possible, principles of tidy data will be applied to clean and analyze the publicly released data file for the 2014 English II EOC exam, providing insights into the EOC program and a case for better public data from the Texas Education Administration (TEA).


An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine Jan 2019

An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine

SMU Data Science Review

In this paper, we present an evaluation of training size impact on validation accuracy for an optimized Convolutional Neural Network (CNN). CNNs are currently the state-of-the-art architecture for object classification tasks. We used Amazon’s machine learning ecosystem to train and test 648 models to find the optimal hyperparameters with which to apply a CNN towards the Fashion-MNIST (Mixed National Institute of Standards and Technology) dataset. We were able to realize a validation accuracy of 90% by using only 40% of the original data. We found that hidden layers appear to have had zero impact on validation accuracy, whereas the neural …


Comparisons Of Performance Between Quantum And Classical Machine Learning, Christopher Havenstein, Damarcus Thomas, Swami Chandrasekaran Jan 2019

Comparisons Of Performance Between Quantum And Classical Machine Learning, Christopher Havenstein, Damarcus Thomas, Swami Chandrasekaran

SMU Data Science Review

In this paper, we present a performance comparison of machine learning algorithms executed on traditional and quantum computers. Quantum computing has potential of achieving incredible results for certain types of problems, and we explore if it can be applied to machine learning. First, we identified quantum machine learning algorithms with reproducible code and had classical machine learning counterparts. Then, we found relevant data sets with which we tested the comparable quantum and classical machine learning algorithm's performance. We evaluated performance with algorithm execution time and accuracy. We found that quantum variational support vector machines in some cases had higher accuracy …


Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt Jan 2019

Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt

SMU Data Science Review

Public surveys are predominantly used when forecasting election outcomes. While the approach has had significant successes, the surveys have had their failures as well, especially when it comes to accuracy and reliability. As a result, it becomes challenging for political parties to spend their campaign budgets in a manner that facilitates the growth of a favorable and verifiable public opinion. Consequently, it is critical that a more accurate methodology to predict election outcome is developed. In this paper, we present an evaluation of the impact of utilizing dynamic public data on predicting the outcome of elections. Our model yielded a …