Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Computer Engineering

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Bert For Question Answering On Bioasq, Eric R. Fu, Rikel Djoko, Maysam Mansor, Robert Slater Jan 2021

Bert For Question Answering On Bioasq, Eric R. Fu, Rikel Djoko, Maysam Mansor, Robert Slater

SMU Data Science Review

Machine reading comprehension and question answering are topics of considerable focus in the field of Natural Language Processing (NLP). In recent years, language models like Bidirectional Encoder Representations from Transformers (BERT) [3] have been very successful in language related tasks like question answering. The difficulty of the question answering task lies in developing accurate representations of language and being able to produce answers for questions. In this study, the focus is to investigate how to train and fine tune a BERT model to improve its performance on BioASQ, a challenge on large scale biomedical question answering. Our most accurate BERT …


Accelerating Reinforcement Learning With Prioritized Experience Replay For Maze Game, Chaoshun Hu, Mehesh Kuklani, Paul Panek Apr 2020

Accelerating Reinforcement Learning With Prioritized Experience Replay For Maze Game, Chaoshun Hu, Mehesh Kuklani, Paul Panek

SMU Data Science Review

In this paper we implemented two ways of improving the performance of reinforcement learning algorithms. We proposed a new equation to prioritize transition samples to improve model accuracy, and by deploying a generalized solver of randomly-generated two-dimensional mazes on a distributed computing platform, our dual-network model is available to others for further research and development. Reinforcement Learning is concerned with identifying the optimal sequence of actions for an agent to take in order to reach an objective to achieve the highest score in the future. Complex situations can lead to computational challenges in terms of both finding the best answer …


Qlime-A Quadratic Local Interpretable Model-Agnostic Explanation Approach, Steven Bramhall, Hayley Horn, Michael Tieu, Nibhrat Lohia Apr 2020

Qlime-A Quadratic Local Interpretable Model-Agnostic Explanation Approach, Steven Bramhall, Hayley Horn, Michael Tieu, Nibhrat Lohia

SMU Data Science Review

In this paper, we introduce a proof of concept that addresses the assumption and limitation of linear local boundaries by Local Interpretable Model-Agnostic Explanations (LIME), a popular technique used to add interpretability and explainability to black box models. LIME is a versatile explainer capable of handling different types of data and models. At the local level, LIME creates a linear relationship for a given prediction through generated sample points to present feature importance. We redefine the linear relationships presented by LIME as quadratic relationships and expand its flexibility in non-linear cases and improve the accuracy of feature interpretations. We coin …


The Data Market: A Proposal To Control Data About You, David Shaw, Daniel W. Engels Apr 2020

The Data Market: A Proposal To Control Data About You, David Shaw, Daniel W. Engels

SMU Data Science Review

The current legal and economic infrastructure facilitating data collection practices and data analysis has led to extreme over-collection of data and the overall loss of personal privacy. Data over-collection has led to a secondary market for consumer data that is invisible to the consumer and results in a person's data being distributed far beyond their knowledge or control. In this paper, we propose a Data Market framework and design for personal data management and privacy protection in which the individual controls and profits from the dissemination of their data. Our proposed Data Market uses a market-based approach utilizing blockchain distributed …


Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen Dec 2019

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen

SMU Data Science Review

This paper presents a comparative study on machine learning methods as they are applied to product associations, future purchase predictions, and predictions of customer churn in aftermarket operations. Association rules are used help to identify patterns across products and find correlations in customer purchase behaviour. Studying customer behaviour as it pertains to Recency, Frequency, and Monetary Value (RFM) helps inform customer segmentation and identifies customers with propensity to churn. Lastly, Flowserve’s customer purchase history enables the establishment of churn thresholds for each customer group and assists in constructing a model to predict future churners. The aim of this model is …


Machine Learning To Predict The Likelihood Of A Personal Computer To Be Infected With Malware, Maryam Shahini, Ramin Farhanian, Marcus Ellis Aug 2019

Machine Learning To Predict The Likelihood Of A Personal Computer To Be Infected With Malware, Maryam Shahini, Ramin Farhanian, Marcus Ellis

SMU Data Science Review

In this paper, we present a new model to predict the prob- ability that a personal computer will become infected with malware. The dataset is selected from a Kaggle competition supported by Mi- crosoft. The data includes computer configuration, owner information, installed software, and configuration information. In our research, sev- eral classification models are utilized to assign a probability of a machine being infected with malware. The LightGBM classifier is the optimum machine learning model by performing faster with higher efficiency and lower memory usage in this research. The LightGBM algorithm obtained a cross-validation ROC-AUC score of 74%. Leading factors …


Aws Ec2 Instance Spot Price Forecasting Using Lstm Networks, Jeffrey Lancon, Yejur Kunwar, David Stroud, Monnie Mcgee, Robert Slater Aug 2019

Aws Ec2 Instance Spot Price Forecasting Using Lstm Networks, Jeffrey Lancon, Yejur Kunwar, David Stroud, Monnie Mcgee, Robert Slater

SMU Data Science Review

Cloud computing is a network of remote computing resources hosted on the Internet that allow users to utilize cloud resources on demand. As such, it represents a paradigm shift in the way businesses and industries think about digital infrastructure. With the shift from IT resources being a capital expenditure to a managed service, companies must rethink how they approach utilizing and optimizing these resources in order to maximize productivity and minimize costs. With proper resource management, cloud resources can be instrumental in reducing computing expenses.

Cloud resources are perishable commodities; therefore, cloud service providers have developed strategies to maximize utilization …


Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater May 2019

Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater

SMU Data Science Review

Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection. A number of successful object detection systems have been proposed in recent years that are based on CNNs. In this paper, an empirical evaluation of three recent meta-architectures: SSD (Single Shot multi-box Detector), R-CNN (Region-based CNN) and R-FCN (Region-based Fully Convolutional Networks) was conducted to measure how fast and accurate they are in identifying objects on the road, such as vehicles, pedestrians, …


Finding Truth In Fake News: Reverse Plagiarism And Other Models Of Classification, Matthew Przybyla, David Tran, Amber Whelpley, Daniel W. Engels Jan 2019

Finding Truth In Fake News: Reverse Plagiarism And Other Models Of Classification, Matthew Przybyla, David Tran, Amber Whelpley, Daniel W. Engels

SMU Data Science Review

As the digital age creates new ways of spreading news, fake stories are propagated to widen audiences. A majority of people obtain both fake and truthful news without knowing which is which. There is not currently a reliable and efficient method to identify “fake news”. Several ways of detecting fake news have been produced, but the various algorithms have low accuracy of detection and the definition of what makes a news item ‘fake’ remains unclear. In this paper, we propose a new method of detecting on of fake news through comparison to other news items on the same topic, as …


Comparative Study Of Sentiment Analysis With Product Reviews Using Machine Learning And Lexicon-Based Approaches, Heidi Nguyen, Aravind Veluchamy, Mamadou Diop, Rashed Iqbal Jan 2019

Comparative Study Of Sentiment Analysis With Product Reviews Using Machine Learning And Lexicon-Based Approaches, Heidi Nguyen, Aravind Veluchamy, Mamadou Diop, Rashed Iqbal

SMU Data Science Review

In this paper, we present a comparative study of text sentiment classification models using term frequency inverse document frequency vectorization in both supervised machine learning and lexicon-based techniques. There have been multiple promising machine learning and lexicon-based techniques, but the relative goodness of each approach on specific types of problems is not well understood. In order to offer researchers comprehensive insights, we compare a total of six algorithms to each other. The three machine learning algorithms are: Logistic Regression (LR), Support Vector Machine (SVM), and Gradient Boosting. The three lexicon-based algorithms are: Valence Aware Dictionary and Sentiment Reasoner (VADER), Pattern, …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


Project Insight: A Granular Approach To Enterprise Cybersecurity, Sunna Quazi, Adam Baca, Sam Darsche Jan 2019

Project Insight: A Granular Approach To Enterprise Cybersecurity, Sunna Quazi, Adam Baca, Sam Darsche

SMU Data Science Review

In this paper, we disambiguate risky activity corporate users are propagating with their software in real time by creating an enterprise security visualization solution for system administrators. The current problem in this domain is the lag in cyber intelligence that inhibits preventative security measure execution. This is partially due to the overemphasis of network activity, which is a nonfinite dataset and is difficult to comprehensively ingest with analytics. We address these concerns by elaborating on the beta of a software called "Insight" created by Felix Security. The overall solution leverages endpoint data along with preexisting whitelist/blacklist designations to unambiguously communicate …


Text Enhanced Recommendation System Model Based On Yelp Reviews, Peter Kouvaris, Ekaterina Pirogova, Hari Sanadhya, Albert Asuncion, Arun Rajagopal Aug 2018

Text Enhanced Recommendation System Model Based On Yelp Reviews, Peter Kouvaris, Ekaterina Pirogova, Hari Sanadhya, Albert Asuncion, Arun Rajagopal

SMU Data Science Review

In this paper, we introduce a useful natural language model for improving recommendation systems using collaborative filtering algorithms on ordinal ratings data. Since their inception, recommendation systems have evolved from simple user-business-rating matrices to complex systems that can consume multiple dimensions. Using Yelp's competition data set, we explore extending these dimensions to include natural language by leveraging a dual neural network architecture to produce a new and improved star rating system which offers potential improvements to collaborative filtering based recommendation systems.


How Much Privacy Do We Have Today? A Study Of The Life Of Marc Mezvinsky, Miguel Mares, Salomon Gilles, Brian D. Gobran, Dan Engels Jul 2018

How Much Privacy Do We Have Today? A Study Of The Life Of Marc Mezvinsky, Miguel Mares, Salomon Gilles, Brian D. Gobran, Dan Engels

SMU Data Science Review

In this paper, we present a case study evaluating the level of information available about an individual through public, Internet-accessible sources. Privacy is a basic tenet of democratic society, but technological advances have made access to information and the identification of individuals much easier through Internet-accessible databases and information stores. To determine the potential level of privacy available to an individual in today’s interconnected world, we sought to develop a detailed history of Marc Mezvinsky, a semi-public figure, husband of Chelsea Clinton, and son of two former members of the United States House of Representatives. By utilizing only publicly and …


Comparative Study Of Deep Learning Models For Network Intrusion Detection, Brian Lee, Sandhya Amaresh, Clifford Green, Daniel Engels Apr 2018

Comparative Study Of Deep Learning Models For Network Intrusion Detection, Brian Lee, Sandhya Amaresh, Clifford Green, Daniel Engels

SMU Data Science Review

In this paper, we present a comparative evaluation of deep learning approaches to network intrusion detection. A Network Intrusion Detection System (NIDS) is a critical component of every Internet connected system due to likely attacks from both external and internal sources. A NIDS is used to detect network born attacks such as Denial of Service (DoS) attacks, malware replication, and intruders that are operating within the system. Multiple deep learning approaches have been proposed for intrusion detection systems. We evaluate three models, a vanilla deep neural net (DNN), self-taught learning (STL) approach, and Recurrent Neural Network (RNN) based Long Short …


Comparative Study: Reducing Cost To Manage Accessibility With Existing Data, Claire Chu, Bill Kerneckel, Eric C. Larson, Nathan Mowat, Christopher Woodard Apr 2018

Comparative Study: Reducing Cost To Manage Accessibility With Existing Data, Claire Chu, Bill Kerneckel, Eric C. Larson, Nathan Mowat, Christopher Woodard

SMU Data Science Review

“Project Sidewalk” is an existing research effort that focuses on mapping accessibility issues for handicapped persons to efficiently plan wheelchair and mobile scooter friendly routes around Washington D.C. As supporters of this project, we utilized the data “Project Sidewalk” collected and used it to confirm predictions about where problem sidewalks exist based on real estate and crime data. We present a study that identifies correlations found between accessibility data and crime and housing statistics in the Washington D.C. metropolitan area. We identify the key reasons for increased accessibility and the issues with the current infrastructure management system. After a thorough …