Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Science Department Faculty Publication Series

2003

Text classification

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Augmenting Naive Bayes Classifiers With Statistical Language Models, Fuchun Peng Jan 2003

Augmenting Naive Bayes Classifiers With Statistical Language Models, Fuchun Peng

Computer Science Department Faculty Publication Series

We augment naive Bayes models with statistical n-gram language models to address short- comings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we re- fer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes classifiers. First, they relax some of the indepen- dence assumptions of naive Bayes—allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, they permit straight- forward application of sophisticated smoothing techniques …