Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 89

Full-Text Articles in Physical Sciences and Mathematics

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler Dec 2023

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler

SMU Data Science Review

Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam Dec 2023

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …


Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun Dec 2023

Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun

SMU Data Science Review

A major problem of the financial industry is the ability to adapt their trading strategies at the same rate the market evolves. This paper proposes a solution using existing Reinforcement Learning libraries to help find new strategies at a practical scale. Using a wide domain of ticker symbols, an algorithm is trained in an environment that better represents reality. The supplied decision-making algorithm is tested using recorded data from the U.S stock market from 2000 through 2022. The results of this research show that existing techniques are statistically better than making decisions at random. With this result, this research shows …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Impact Of Covid-19 On Recruitment Of High School Athletes To Di Track And Field, Christopher Haub, Jon Paugh, Alonso Salcido, Monnie Mcgee Dec 2023

Impact Of Covid-19 On Recruitment Of High School Athletes To Di Track And Field, Christopher Haub, Jon Paugh, Alonso Salcido, Monnie Mcgee

SMU Data Science Review

Due to COVID-19, in the spring of 2020, the NCAA gave scholarship athletes an extra year of eligibility but did not increase the number of scholarships a school could issue. This potentially led to increased competition for scholarships as coaches could choose between retaining athletes or recruiting new ones. Furthermore, the Spring 2020 track and field season for high school seniors ended early – limiting high school athletes’ chance to get their best scores, and interrupting student to college interaction. This research looks specifically at the impact of COVID-19, and the resulting NCAA policy changes, on the recruitment to DI …


A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety Dec 2023

A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety

SMU Data Science Review

Microsoft collects an immense amount of data from the users of their product-self-help documentation. Employees use this data to identify these self-help articles' performance trends and measure their impact on business Key Performance Indicators (KPIs). Microsoft uses various tools like Power BI and Python to analyze this data. The problem is that their analysis and findings are summarized manually. Therefore, this research will improve upon their current analysis methods by applying the latest prompt engineering practices and the power of ChatGPT's large language models (LLMs). Using VBA code, Microsoft Excel, and the ChatGPT API as an Excel add-in, this research …


The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie Mcgee, Raag Patel, Roslyn Smith, Satvik Ajmera Dec 2023

The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie Mcgee, Raag Patel, Roslyn Smith, Satvik Ajmera

SMU Data Science Review

Women and minorities within STEM disciplines historically encounter obstacles in academic advancement, a situation compounded by the COVID-19 pandemic due to the imposition of additional responsibilities like caregiving. This study meticulously probes into the pandemic's influence on traditional academic productivity metrics – specifically publication and submission frequency, citation volume, and leadership in scholarly entities, by employing Natural Language Processing to extract and analyze data from key journals within various scientific domains. A critical revelation from the research indicates a notable downturn in publication activity during 2021, potentially attributed to pandemic-induced disruptions, with a compensatory surge observed in 2022. Although a …


Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater Dec 2023

Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater

SMU Data Science Review

Accurately measuring the recovery of released surface mines in the UnitedStates poses crucial challenges. This study aims to develop a prediction of land classification, that considers various environmental and coal mine variables. By utilizing this prediction, the researchers and environmentalists (specifically Appalachian Voices, the group heading this research) can better understand the relevant factors for successful reclamation. Efficient management of mine recovery is essential for environmental sustainability, regulatory compliance, and resource utilization. This study focuses on the Appalachian Forest area, which risks becoming a net carbon source (a place that emits more carbon than it absorbs) due to mine recovery. …


Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre Dec 2023

Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre

SMU Data Science Review

Abstract. Post-acquisition data analysis of microscopy images is a vital yet time-consuming process for researchers. Quantitative fields such as biology and microbiology often require using images as primary data sources. Finding methods to automate this process would increase the throughput, quality, and reproducibility. This research aims to provide a novel end-to-end pipeline that reduces the workload on researchers in identifying cell cytoplasm and nuclei while creating a process that can scale to the researcher's needs. The proposed methodology utilizes various image-processing techniques to rapidly identify the boundaries of cells and nuclei, including filtering, thresholding, and deep learning. The results …


Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater Aug 2023

Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater

SMU Data Science Review

Static and dynamic analyses are the two primary approaches to analyzing malicious applications. The primary distinction between the two is that the application is analyzed without execution in static analysis, whereas the dynamic approach executes the malware and records the behavior exhibited during execution. Although each approach has advantages and disadvantages, dynamic analysis has been more widely accepted and utilized by the research community whereas static analysis has not seen the same attention. This study aims to apply advancements in static analysis techniques to demonstrate the identification of fine-grained functionality, and show, through clustering, how malicious applications may be grouped …


Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy Aug 2023

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy

SMU Data Science Review

American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …


A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda Aug 2023

A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda

SMU Data Science Review

Statistical models in time series forecasting have long been challenged to be superseded by the advent of deep learning models. This research proposes a new hybrid ensemble of forecasting models that combines the strengths of several strong candidates from these two model types. The proposed ensemble aims to improve the accuracy of forecasts and reduce computational complexity by leveraging the strengths of each candidate model.


Identifying Features And Predicting Consumer Helpfulness Of Product Reviews, Triston Hudgins, Shijo Joseph, Douglas Yip, Gaston Besanson Apr 2023

Identifying Features And Predicting Consumer Helpfulness Of Product Reviews, Triston Hudgins, Shijo Joseph, Douglas Yip, Gaston Besanson

SMU Data Science Review

Major corporations utilize data from online platforms to make user product or service recommendations. Companies like Netflix, Amazon, Yelp, and Spotify rely on purchasing trends, user reviews, and helpfulness votes to make content recommendations. This strategy can increase user engagement on a company's platform. However, misleading and/or spam reviews significantly hinder the success of these recommendation strategies. The rise of social media has made it increasingly difficult to distinguish between authentic content and advertising, leading to a burst of deceptive reviews across the marketplace. The helpfulness of the review is subjective to a voting system. As such, this study aims …


Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater Apr 2023

Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater

SMU Data Science Review

A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.


Question Answering With Distilled Bert Models: A Case Study For Biomedical Data, Brittany Lewandowski, Rayon Morris, Pearly Merin Paul, Robert Slater Apr 2023

Question Answering With Distilled Bert Models: A Case Study For Biomedical Data, Brittany Lewandowski, Rayon Morris, Pearly Merin Paul, Robert Slater

SMU Data Science Review

In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their …


Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia Apr 2023

Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia

SMU Data Science Review

Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.


Following The Crowd: Beginners Investors Guide To The Options Market, Jeremy Dawkins, Alexy Morris, Jacob Gipson, Masoud Valizadeh Apr 2023

Following The Crowd: Beginners Investors Guide To The Options Market, Jeremy Dawkins, Alexy Morris, Jacob Gipson, Masoud Valizadeh

SMU Data Science Review

While the options market may be intimidating for a beginner, having the right tools can help improve the outcome of their investments. This project aims to develop a tool that uses time-series analysis and forecasting to model the future demand of S&P 500 and AAPL options contracts. The open interest of these contracts will be analyzed using various models such as AR, ARIMA, Neural Networks, and VAR, along with the put-call ratio. The goal is not to make buy or sell recommendations, but alert the user when money is flowing into a security or index. Of all the models, the …


The Role Of Machine Learning In Improved Functionality Of Lower Limb Prostheses, Joaquin Dominguez, Richard Kim, Robert Slater Apr 2023

The Role Of Machine Learning In Improved Functionality Of Lower Limb Prostheses, Joaquin Dominguez, Richard Kim, Robert Slater

SMU Data Science Review

Lower-limb amputations can cause a plethora of obstacles that lead to a lower quality of life. Implementing machine learning techniques means advanced prosthetics can contribute to facilitating the lives of those that live with lower-limb amputations. Using the publicly available HuGaDB data set, the current study investigates several classification models (random forest, neural network, and Vowpal Wabbit) to predict the locomotive intentions of individuals using lower-limb prostheses. The results of this study show that the neural network model yielded the highest accuracy, comparable precision, and recall scores to the other models. However, the Vowpal Wabbit model's advantage in speed may …


Using Nlp To Model U.S. Supreme Court Cases, Katherine Lockard, Robert Slater, Brandon Sucrese Apr 2023

Using Nlp To Model U.S. Supreme Court Cases, Katherine Lockard, Robert Slater, Brandon Sucrese

SMU Data Science Review

The advantages of employing text analysis to uncover policy positions, generate legal predictions, and inform or evaluate reform practices are multifold. Given the far-reaching effects of legislation at all levels of society these insights and their continued improvement are impactful. This research explores the use of natural language processing (NLP) and machine learning to predictively model U.S. Supreme Court case outcomes based on textual case facts. The final model achieved an F1-score of .324 and an AUC of .68. This suggests that the model can distinguish between the two target classes; however, further research is needed before machine learning models …


Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard Apr 2023

Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard

SMU Data Science Review

The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …


Professor Text: University Fundraising Optimization, Braden Anderson, Connor Dobbs, Hien Lam, John Santerre Apr 2023

Professor Text: University Fundraising Optimization, Braden Anderson, Connor Dobbs, Hien Lam, John Santerre

SMU Data Science Review

University fundraising campaigns are a unique type of cause-related marketing with its own challenges and opportunities. Campaigns like this typically last an extended period, such as five or more years, and goals exist beyond the dollar amount raised. These supplemental goals, such as awareness among potential future donators or brand reputation within the local community, are important to consider and strategize. There can also be unique limitations, such as requiring advertising specifically on recent large gifts or endowment programs. This research explores how machine learning techniques such as natural language processing can be used to optimize a fundraising campaign strategy, …


Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba Mar 2023

Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba

SMU Data Science Review

Non-Fungible Tokens (NFTs) enable ownership and transfer of digital assets using blockchain technology. As a relatively new financial asset class, NFTs lack robust oversight and regulations. These conditions create an environment that is susceptible to fraudulent activity and market manipulation schemes. This study examines the buyer-seller network transactional data from some of the most popular NFT marketplaces (e.g., AtomicHub, OpenSea) to identify and predict fraudulent activity. To accomplish this goal multiple features such as price, volume, and network metrics were extracted from NFT transactional data. These were fed into a Multiple-Scale Convolutional Neural Network that predicts suspected fraudulent activity based …


Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Deep Learning For Online Fashion: A Novel Solution For The Retail E-Commerce Industry, Zachary O. Harris, Gowtham G. Katta, Robert Slater, Joseph L. Woodall Iv Mar 2023

Deep Learning For Online Fashion: A Novel Solution For The Retail E-Commerce Industry, Zachary O. Harris, Gowtham G. Katta, Robert Slater, Joseph L. Woodall Iv

SMU Data Science Review

The online shopping experience for clothing can be further enhanced by implementing Deep Learning techniques, such as Computer Vision and personalized recommendation systems. Automation, as a principle, can be applied to solving problems surrounding efficacy, efficiency, and security. It also provides a layer of abstraction for the user during the online shopping experience. This research aims to apply Deep Learning methods and principles of automation to augment the e-commerce fashion market in a novel way. After using these methods, it was found that Convolutional Autoencoders and Item-to-Item Based Recommenders may be used to accurately and precisely recommend articles of clothing …


Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury Sep 2022

Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury

SMU Data Science Review

Phishing emails are a primary mode of entry for attackers into an organization. A successful phishing attempt leads to unauthorized access to sensitive information and systems. However, automatically identifying phishing emails is often difficult since many phishing emails have composite features such as body text and metadata that are nearly indistinguishable from valid emails. This paper presents a novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails. The framework analyses each composite feature independently utilizing a multi-faceted approach using Natural Language …


Classification Of Breast Cancer Histopathological Images Using Semi-Supervised Gans, Balaji Avvaru, Nibhrat Lohia, Sowmya Mani, Vijayasrikanth Kaniti Sep 2022

Classification Of Breast Cancer Histopathological Images Using Semi-Supervised Gans, Balaji Avvaru, Nibhrat Lohia, Sowmya Mani, Vijayasrikanth Kaniti

SMU Data Science Review

Breast cancer is diagnosed more frequently than skin cancer in women in the United States. Most breast cancer cases are diagnosed in women, while children and men are less likely to develop the disease. Various tissues in the breast grow uncontrollably, resulting in breast cancer. Different treatments analyze microscopic histopathology images for diagnosis that help accurately detect cancer cells. Deep learning is one of the evolving techniques to classify images where accuracy depends on the volume and quality of labeled images. This study used various pre-trained models to train the histopathological images and analyze these models to create a new …


Short Term Forecasting Of Solar Radiation, Ashwin Thota, Bradley Blanchard, Lijju Mathew, Paritosh Rai, Sid Swarupananda Sep 2022

Short Term Forecasting Of Solar Radiation, Ashwin Thota, Bradley Blanchard, Lijju Mathew, Paritosh Rai, Sid Swarupananda

SMU Data Science Review

This paper details how to predict solar radiation at a location for the next few hours using machine learning techniques like Facebook’s Prophet, and Amazon’s DeepAR+. Multiple techniques like AutoRegressive (ARIMA) and Exponential Smoothing (ES) have been used to forecast solar radiation, but they lack accuracy and are not scalable. Whereas Prophet, and Amazon’s DeepAR+ are scalable, accurate, and easily integrated into other machine learning techniques. This will be the first time where the combination of these techniques along with Linear Regression, Random Forest, XGBoost and Decision Tree will be leveraged to forecast solar radiation for the short term. Predicting …


Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed Sep 2022

Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed

SMU Data Science Review

For English teachers and students who are dissatisfied with the one-size-fits-all approach of current Automated Essay Scoring (AES) systems, this research uses Natural Language Processing (NLP) techniques that provide a focus on configurability and interpretability. Unlike traditional AES models which are designed to provide an overall score based on pre-trained criteria, this tool allows teachers to tailor feedback based upon specific focus areas. The tool implements a user-interface that serves as a customizable rubric. Students’ essays are inputted into the tool either by the student or by the teacher via the application’s user-interface. Based on the rubric settings, the tool …


Stock Forecasts With Lstm And Web Sentiment, Michael Burgess, Faizan Javed, Nnenna Okpara, Chance Robinson Sep 2022

Stock Forecasts With Lstm And Web Sentiment, Michael Burgess, Faizan Javed, Nnenna Okpara, Chance Robinson

SMU Data Science Review

Traditional time-series techniques, such as auto-regressive and moving average models, can have difficulties when applied to stock data due to the randomness inherent to the markets. In this study, Long Short-Term Memory Recurrent Neural Networks, or LSTMs, have been applied to pricing data along with sentiment scores derived from web sources such as Twitter and other financial media outlets. The project team utilized this approach to complement the technical indicators observed at the end of each trading day for three stocks from the NASDAQ stock exchange over a 12-year span. A common benchmark to assess model performance on time series …