Open Access. Powered by Scholars. Published by Universities.®

Arts and Humanities Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2003

University of South Florida

Machine learning

Articles 1 - 2 of 2

Full-Text Articles in Arts and Humanities

Graph-Theoretic Techniques For Web Content Mining, Adam Schenker Sep 2003

Graph-Theoretic Techniques For Web Content Mining, Adam Schenker

USF Tampa Graduate Theses and Dissertations

In this dissertation we introduce several novel techniques for performing data mining on web documents which utilize graph representations of document content. Graphs are more robust than typical vector representations as they can model structural information that is usually lost when converting the original web document content to a vector representation. For example, we can capture information such as the location, order and proximity of term occurrence, which is discarded under the standard document vector representation models. Many machine learning methods rely on distance computations, centroid calculations, and other numerical techniques. Thus many of these methods have not been applied …


Scavenger: A Junk Mail Classification Program, Rohan V. Malkhare Jan 2003

Scavenger: A Junk Mail Classification Program, Rohan V. Malkhare

USF Tampa Graduate Theses and Dissertations

The problem of junk mail, also called spam, has reached epic proportions and various efforts are underway to fight spam. Junk mail classification using machine learning techniques is a key method to fight spam. We have devised a machine learning algorithm where features are created from individual sentences in the subject and body of a message by forming all possible word-pairings from a sentence. Weights are assigned to the features based on the strength of their predictive capabilities for spam/legitimate determination. The predictive capabilities are estimated by the frequency of occurrence of the feature in spam/legitimate collections as well as …