Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

2014

Dimension reduction

Articles 1 - 1 of 1

Full-Text Articles in Artificial Intelligence and Robotics

Cenknn: A Scalable And Effective Text Classifier, Guansong Pang, Huidong Jin, Shengyi Jiang Jul 2014

Cenknn: A Scalable And Effective Text Classifier, Guansong Pang, Huidong Jin, Shengyi Jiang

Research Collection School Of Computing and Information Systems

A big challenge in text classification is to perform classification on a large-scale and high-dimensional text corpus in the presence of imbalanced class distributions and a large number of irrelevant or noisy term features. A number of techniques have been proposed to handle this challenge with varying degrees of success. In this paper, by combining the strengths of two widely used text classification techniques, K-Nearest-Neighbor (KNN) and centroid based (Centroid) classifiers, we propose a scalable and effective flat classifier, called CenKNN, to cope with this challenge. CenKNN projects high-dimensional (often hundreds of thousands) documents into a low-dimensional (normally a few …