Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Machine learning

Databases and Information Systems

2007

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Learning To Classify E-Mail, Irena Koprinska, Josiah Poon, James Clark, Jason Yuk Hin Chan May 2007

Learning To Classify E-Mail, Irena Koprinska, Josiah Poon, James Clark, Jason Yuk Hin Chan

Research Collection School Of Computing and Information Systems

In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naive Bayes. We introduce a new accurate feature selector with linear time complexity. …