Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury Sep 2022

Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury

SMU Data Science Review

Phishing emails are a primary mode of entry for attackers into an organization. A successful phishing attempt leads to unauthorized access to sensitive information and systems. However, automatically identifying phishing emails is often difficult since many phishing emails have composite features such as body text and metadata that are nearly indistinguishable from valid emails. This paper presents a novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails. The framework analyses each composite feature independently utilizing a multi-faceted approach using Natural Language …


Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed Sep 2022

Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed

SMU Data Science Review

For English teachers and students who are dissatisfied with the one-size-fits-all approach of current Automated Essay Scoring (AES) systems, this research uses Natural Language Processing (NLP) techniques that provide a focus on configurability and interpretability. Unlike traditional AES models which are designed to provide an overall score based on pre-trained criteria, this tool allows teachers to tailor feedback based upon specific focus areas. The tool implements a user-interface that serves as a customizable rubric. Students’ essays are inputted into the tool either by the student or by the teacher via the application’s user-interface. Based on the rubric settings, the tool …


Stock Forecasts With Lstm And Web Sentiment, Michael Burgess, Faizan Javed, Nnenna Okpara, Chance Robinson Sep 2022

Stock Forecasts With Lstm And Web Sentiment, Michael Burgess, Faizan Javed, Nnenna Okpara, Chance Robinson

SMU Data Science Review

Traditional time-series techniques, such as auto-regressive and moving average models, can have difficulties when applied to stock data due to the randomness inherent to the markets. In this study, Long Short-Term Memory Recurrent Neural Networks, or LSTMs, have been applied to pricing data along with sentiment scores derived from web sources such as Twitter and other financial media outlets. The project team utilized this approach to complement the technical indicators observed at the end of each trading day for three stocks from the NASDAQ stock exchange over a 12-year span. A common benchmark to assess model performance on time series …


Web Page Multiclass Classification, Brian Gaither, Antonio Debouse, Catherine Huang Jun 2022

Web Page Multiclass Classification, Brian Gaither, Antonio Debouse, Catherine Huang

SMU Data Science Review

As the internet age evolves, the volume of content hosted on the Web is rapidly expanding. With this ever-expanding content, the capability to accurately categorize web pages is a current challenge to serve many use cases. This paper proposes a variation in the approach to text preprocessing pipeline whereby noun phrase extraction is performed first followed by lemmatization, contraction expansion, removing special characters, removing extra white space, lower casing, and removal of stop words. The first step of noun phrase extraction is aimed at reducing the set of terms to those that best describe what the web pages are about …