Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 61 - 61 of 61

Full-Text Articles in Entire DC Network

Classification Of Web Pages In Yioop With Active Learning, Shawn Cameron Tice Jan 2013

Classification Of Web Pages In Yioop With Active Learning, Shawn Cameron Tice

Master's Theses

This thesis project augments the Yioop search engine with a general facility for automatically assigning "class" meta words (e.g., "class:advertising") to web pages based on the output of a logistic regression text classifier. Users can create multiple classifers using Yioop's web-based interface, each trained first on a small set of labeled documents drawn from previous crawls then improved over repeated rounds of active learning using density-weighted pool-based sampling.

The classification system's accuracy when classifying new documents was found to be comparable to published results for a common dataset, approaching 82% for a corpus of advertisements to be filtered from content-providers' …