Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 162

Full-Text Articles in Physical Sciences and Mathematics

Predictive Analysis Of Local House Prices: Leveraging Machine Learning For Real Estate Valuation, Joey Hernandez, Danny Chang, Santiago Gutierrez, Paul Huggins May 2024

Predictive Analysis Of Local House Prices: Leveraging Machine Learning For Real Estate Valuation, Joey Hernandez, Danny Chang, Santiago Gutierrez, Paul Huggins

SMU Data Science Review

This paper presents a comprehensive study examining the real estate market potential in the dynamic urban landscapes of Frisco and Plano, Texas. Combining traditional real estate analysis with cutting-edge machine learning techniques, the study aims to predict home prices and assess investment feasibility. Leveraging these findings, the study proposes a strategic focus on predictive modeling and investment potential identification, emphasizing the continual refinement of machine learning models with updated data to accurately forecast changes in the real estate market. By harnessing the predictive power of these models, investors can identify high-growth areas and optimize their investment decisions, thus capitalizing on …


A Symbolic Approach To Nonlinear Time Series Analysis, Ranjan Karki, Nibhrat Lohia, Michael B. Schulte May 2024

A Symbolic Approach To Nonlinear Time Series Analysis, Ranjan Karki, Nibhrat Lohia, Michael B. Schulte

SMU Data Science Review

Current nonlinear time series methods such as neural networks forecast well. However, they act as a black box and are difficult to interpret, leaving the researchers and the audience with little insight into why the forecasts are the way they are. There is a need for a method that forecasts accurately while also being easy to interpret. This paper aims to develop a method to build an interpretable model for univariate and multivariate nonlinear time series data using wavelets and symbolic regression. The final method relies on multilayer perceptron (MLP) neural networks as a form of dimensionality reduction and the …


Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma May 2024

Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma

SMU Data Science Review

This paper explores the intricate challenges log files pose from data science and machine learning perspectives. Drawing inspiration from existing methods, LAnoBERT, PULL, LLMs, and the breadth of recent research, this paper aims to push the boundaries of machine learning for log file systems. Our study comprehensively examines the unique challenges presented in our problem setup, delineates the limitations of existing methods, and introduces innovative solutions. These contributions are organized to offer valuable insights, predictions, and actionable recommendations tailored for Microsoft's engineers working on log data analysis.


Baseball Decision-Making: Optimizing At-Bat Simulations, Varun Gopal, Krithika Kondakindi, Nibhrat Lohia, Morgan Williams May 2024

Baseball Decision-Making: Optimizing At-Bat Simulations, Varun Gopal, Krithika Kondakindi, Nibhrat Lohia, Morgan Williams

SMU Data Science Review

Pitch selection in baseball plays a crucial role, involving pitchers, catchers, and batters working together. This practice, dating back to early baseball, has seen teams try various methods to gain an advantage. This research aims to use reinforcement learning and pitch-by-pitch Statcast data to improve batting strategies. It also builds on previous statistical work (sabermetrics) to make better choices in pitch selection and plate discipline. The dataset used, including over 700,000 pitches for each full season and 200,000 pitches for the COVID-shortened 2020 season, encompasses a wealth of crucial metrics including pitch release point, velocity, and launch angle. This study …


Reevaluating Texas Energy Market Forecasts In The Wake Of Recent Extreme Weather Events, Robert A. Derner, Richard W. Butler Ii, Alexandria Neff, Adam R. Ruthford May 2024

Reevaluating Texas Energy Market Forecasts In The Wake Of Recent Extreme Weather Events, Robert A. Derner, Richard W. Butler Ii, Alexandria Neff, Adam R. Ruthford

SMU Data Science Review

This paper provides updated forecasts of energy demand in Texas and recognizes the impact of sustainable energy. It is important that the forecasts of the adoption of sustainable energy are reexamined after Winter Storm Uri crippled the Texas power grid and left many without power. This storm highlighted the issues the Texas power grid had and has continued to struggle with in supplying the state with energy. This paper will offer an overview of the relevant literature on the adoption of sustainable energy and relevant events that have occurred in the state of Texas that will give the reader the …


Multi-Class Emotion Classification With Xgboost Model Using Wearable Eeg Headband Data, James Khamthung, Nibhrat Lohia, Seement Srivastava May 2024

Multi-Class Emotion Classification With Xgboost Model Using Wearable Eeg Headband Data, James Khamthung, Nibhrat Lohia, Seement Srivastava

SMU Data Science Review

Electroencephalography (EEG) or brainwave signals serve as a valuable source for discerning human activities, thoughts, and emotions. This study explores the efficacy of EXtreme Gradient Boosting (XGBoost) models in sentiment classification using EEG signals, specifically those captured by the MUSE EEG headband. The MUSE device, equipped with four EEG electrodes (TP9, AF7, AF8, TP10), offers a cost-effective alternative to traditional EEG setups, which often utilize over 60 channels in laboratory-grade settings. Leveraging a dataset from previous MUSE research (Bird, J. et al., 2019), emotional states (positive, neutral, and negative) were observed in a male and a female participant, each for …


Building Effective Large Language Model Agents, Sydney Holder, Shreyash Taywade May 2024

Building Effective Large Language Model Agents, Sydney Holder, Shreyash Taywade

SMU Data Science Review

The advancement of large language models (LLMs) has significantly expanded the influence of artificial intelligence across various sectors. This paper explores building LLM agents to power applications and examines what is necessary to build an efficient and helpful AI assistant. The research investigates the core components necessary to create specialized agents, facilitate collaboration in problem-solving, and improve human task performance. The development and application of tools designed to augment the capabilities of LLM agents are also explored. The paper addresses the potential risks of the unknowns, such as hallucinations, which can compromise the success of agent-based solutions within LLM applications. …


Game Recommendation Analysis Using Steam Profiles And Reviews, Robert Blue, Luis Garcia, Jacob Turner May 2024

Game Recommendation Analysis Using Steam Profiles And Reviews, Robert Blue, Luis Garcia, Jacob Turner

SMU Data Science Review

Smaller game studios are at a disadvantage when it comes to getting their product noticed by users. This study aims to provide insights on how recommendation engines work so that these smaller studios can have their games noticed on Steam. Steam is one of the largest video game distribution services and they have a recommendation engine which promotes games to its user base. This study utilized user information such as number of games played, the type of games, and the hours played and created recommendation engines to identify the qualities in the game that are driving recommendations.


Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn May 2024

Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn

SMU Data Science Review

As the digital music landscape continues to expand, the need for effective methods to understand and contextualize the diverse genres of lyrical content becomes increasingly critical. This research focuses on the application of transformer models in the domain of music analysis, specifically in the task of lyric genre classification. By leveraging the advanced capabilities of transformer architectures, this project aims to capture intricate linguistic nuances within song lyrics, thereby enhancing the accuracy and efficiency of genre classification. The relevance of this project lies in its potential to contribute to the development of automated systems for music recommendation and genre-based playlist …


Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler Dec 2023

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler

SMU Data Science Review

Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam Dec 2023

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …


Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun Dec 2023

Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun

SMU Data Science Review

A major problem of the financial industry is the ability to adapt their trading strategies at the same rate the market evolves. This paper proposes a solution using existing Reinforcement Learning libraries to help find new strategies at a practical scale. Using a wide domain of ticker symbols, an algorithm is trained in an environment that better represents reality. The supplied decision-making algorithm is tested using recorded data from the U.S stock market from 2000 through 2022. The results of this research show that existing techniques are statistically better than making decisions at random. With this result, this research shows …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Impact Of Covid-19 On Recruitment Of High School Athletes To Di Track And Field, Christopher Haub, Jon Paugh, Alonso Salcido, Monnie Mcgee Dec 2023

Impact Of Covid-19 On Recruitment Of High School Athletes To Di Track And Field, Christopher Haub, Jon Paugh, Alonso Salcido, Monnie Mcgee

SMU Data Science Review

Due to COVID-19, in the spring of 2020, the NCAA gave scholarship athletes an extra year of eligibility but did not increase the number of scholarships a school could issue. This potentially led to increased competition for scholarships as coaches could choose between retaining athletes or recruiting new ones. Furthermore, the Spring 2020 track and field season for high school seniors ended early – limiting high school athletes’ chance to get their best scores, and interrupting student to college interaction. This research looks specifically at the impact of COVID-19, and the resulting NCAA policy changes, on the recruitment to DI …


A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety Dec 2023

A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety

SMU Data Science Review

Microsoft collects an immense amount of data from the users of their product-self-help documentation. Employees use this data to identify these self-help articles' performance trends and measure their impact on business Key Performance Indicators (KPIs). Microsoft uses various tools like Power BI and Python to analyze this data. The problem is that their analysis and findings are summarized manually. Therefore, this research will improve upon their current analysis methods by applying the latest prompt engineering practices and the power of ChatGPT's large language models (LLMs). Using VBA code, Microsoft Excel, and the ChatGPT API as an Excel add-in, this research …


The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie Mcgee, Raag Patel, Roslyn Smith, Satvik Ajmera Dec 2023

The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie Mcgee, Raag Patel, Roslyn Smith, Satvik Ajmera

SMU Data Science Review

Women and minorities within STEM disciplines historically encounter obstacles in academic advancement, a situation compounded by the COVID-19 pandemic due to the imposition of additional responsibilities like caregiving. This study meticulously probes into the pandemic's influence on traditional academic productivity metrics – specifically publication and submission frequency, citation volume, and leadership in scholarly entities, by employing Natural Language Processing to extract and analyze data from key journals within various scientific domains. A critical revelation from the research indicates a notable downturn in publication activity during 2021, potentially attributed to pandemic-induced disruptions, with a compensatory surge observed in 2022. Although a …


Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater Dec 2023

Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater

SMU Data Science Review

Accurately measuring the recovery of released surface mines in the UnitedStates poses crucial challenges. This study aims to develop a prediction of land classification, that considers various environmental and coal mine variables. By utilizing this prediction, the researchers and environmentalists (specifically Appalachian Voices, the group heading this research) can better understand the relevant factors for successful reclamation. Efficient management of mine recovery is essential for environmental sustainability, regulatory compliance, and resource utilization. This study focuses on the Appalachian Forest area, which risks becoming a net carbon source (a place that emits more carbon than it absorbs) due to mine recovery. …


Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre Dec 2023

Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre

SMU Data Science Review

Abstract. Post-acquisition data analysis of microscopy images is a vital yet time-consuming process for researchers. Quantitative fields such as biology and microbiology often require using images as primary data sources. Finding methods to automate this process would increase the throughput, quality, and reproducibility. This research aims to provide a novel end-to-end pipeline that reduces the workload on researchers in identifying cell cytoplasm and nuclei while creating a process that can scale to the researcher's needs. The proposed methodology utilizes various image-processing techniques to rapidly identify the boundaries of cells and nuclei, including filtering, thresholding, and deep learning. The results …


Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater Aug 2023

Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater

SMU Data Science Review

Static and dynamic analyses are the two primary approaches to analyzing malicious applications. The primary distinction between the two is that the application is analyzed without execution in static analysis, whereas the dynamic approach executes the malware and records the behavior exhibited during execution. Although each approach has advantages and disadvantages, dynamic analysis has been more widely accepted and utilized by the research community whereas static analysis has not seen the same attention. This study aims to apply advancements in static analysis techniques to demonstrate the identification of fine-grained functionality, and show, through clustering, how malicious applications may be grouped …


Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy Aug 2023

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy

SMU Data Science Review

American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …


Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler Aug 2023

Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler

SMU Data Science Review

In recent years, various new Machine Learning and Deep Learning algorithms have been introduced, claiming to offer better performance than traditional statistical approaches when forecasting time series. Studies seeking evidence to support the usage of ML/DL over statistical approaches have been limited to comparing the forecasting performance of univariate, linear time series data. This research compares the performance of traditional statistical-based and ML/DL methods for forecasting multivariate and nonlinear time series.


A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda Aug 2023

A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda

SMU Data Science Review

Statistical models in time series forecasting have long been challenged to be superseded by the advent of deep learning models. This research proposes a new hybrid ensemble of forecasting models that combines the strengths of several strong candidates from these two model types. The proposed ensemble aims to improve the accuracy of forecasts and reduce computational complexity by leveraging the strengths of each candidate model.


Identifying Features And Predicting Consumer Helpfulness Of Product Reviews, Triston Hudgins, Shijo Joseph, Douglas Yip, Gaston Besanson Apr 2023

Identifying Features And Predicting Consumer Helpfulness Of Product Reviews, Triston Hudgins, Shijo Joseph, Douglas Yip, Gaston Besanson

SMU Data Science Review

Major corporations utilize data from online platforms to make user product or service recommendations. Companies like Netflix, Amazon, Yelp, and Spotify rely on purchasing trends, user reviews, and helpfulness votes to make content recommendations. This strategy can increase user engagement on a company's platform. However, misleading and/or spam reviews significantly hinder the success of these recommendation strategies. The rise of social media has made it increasingly difficult to distinguish between authentic content and advertising, leading to a burst of deceptive reviews across the marketplace. The helpfulness of the review is subjective to a voting system. As such, this study aims …


Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater Apr 2023

Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater

SMU Data Science Review

A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.


Question Answering With Distilled Bert Models: A Case Study For Biomedical Data, Brittany Lewandowski, Rayon Morris, Pearly Merin Paul, Robert Slater Apr 2023

Question Answering With Distilled Bert Models: A Case Study For Biomedical Data, Brittany Lewandowski, Rayon Morris, Pearly Merin Paul, Robert Slater

SMU Data Science Review

In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their …


Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia Apr 2023

Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia

SMU Data Science Review

Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.


Following The Crowd: Beginners Investors Guide To The Options Market, Jeremy Dawkins, Alexy Morris, Jacob Gipson, Masoud Valizadeh Apr 2023

Following The Crowd: Beginners Investors Guide To The Options Market, Jeremy Dawkins, Alexy Morris, Jacob Gipson, Masoud Valizadeh

SMU Data Science Review

While the options market may be intimidating for a beginner, having the right tools can help improve the outcome of their investments. This project aims to develop a tool that uses time-series analysis and forecasting to model the future demand of S&P 500 and AAPL options contracts. The open interest of these contracts will be analyzed using various models such as AR, ARIMA, Neural Networks, and VAR, along with the put-call ratio. The goal is not to make buy or sell recommendations, but alert the user when money is flowing into a security or index. Of all the models, the …


The Role Of Machine Learning In Improved Functionality Of Lower Limb Prostheses, Joaquin Dominguez, Richard Kim, Robert Slater Apr 2023

The Role Of Machine Learning In Improved Functionality Of Lower Limb Prostheses, Joaquin Dominguez, Richard Kim, Robert Slater

SMU Data Science Review

Lower-limb amputations can cause a plethora of obstacles that lead to a lower quality of life. Implementing machine learning techniques means advanced prosthetics can contribute to facilitating the lives of those that live with lower-limb amputations. Using the publicly available HuGaDB data set, the current study investigates several classification models (random forest, neural network, and Vowpal Wabbit) to predict the locomotive intentions of individuals using lower-limb prostheses. The results of this study show that the neural network model yielded the highest accuracy, comparable precision, and recall scores to the other models. However, the Vowpal Wabbit model's advantage in speed may …


Using Nlp To Model U.S. Supreme Court Cases, Katherine Lockard, Robert Slater, Brandon Sucrese Apr 2023

Using Nlp To Model U.S. Supreme Court Cases, Katherine Lockard, Robert Slater, Brandon Sucrese

SMU Data Science Review

The advantages of employing text analysis to uncover policy positions, generate legal predictions, and inform or evaluate reform practices are multifold. Given the far-reaching effects of legislation at all levels of society these insights and their continued improvement are impactful. This research explores the use of natural language processing (NLP) and machine learning to predictively model U.S. Supreme Court case outcomes based on textual case facts. The final model achieved an F1-score of .324 and an AUC of .68. This suggests that the model can distinguish between the two target classes; however, further research is needed before machine learning models …


Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard Apr 2023

Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard

SMU Data Science Review

The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …