Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theory and Algorithms

Air Force Institute of Technology

2024

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals Mar 2024

The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals

Faculty Publications

Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was …