Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

San Jose State University

Theses/Dissertations

2017

Stop words

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Intelligent Web Crawler For Semantic Search Engine, Shujia Zhang Feb 2017

Intelligent Web Crawler For Semantic Search Engine, Shujia Zhang

Master's Projects

A Semantic Search Engine (SSE) is a program that produces semantic-oriented concepts from the Internet. A web crawler is the front end of our SSE; its primary goal is to supply important and necessary information to the data analysis component of SSE. The main function of the analysis component is to produce the concepts (moderately frequent finite sequences of keywords) from the input; it uses some variants of TF-IDF as a primary tool to remove stop words. However, it is a very expensive way to filter out stop words using the idea of TF-IDF. The goal of this project is …