Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Life Sciences

PDF

Wright State University

2005

Document Clustering

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

A Modular Approach To Document Indexing And Semantic Search, Dhanya Ravishankar, Krishnaprasad Thirunarayan, Trivikram Immaneni Jul 2005

A Modular Approach To Document Indexing And Semantic Search, Dhanya Ravishankar, Krishnaprasad Thirunarayan, Trivikram Immaneni

Kno.e.sis Publications

This paper develops a modular approach to improving effectiveness of searching documents for information by reusing and integrating mature software components such as Lucene APIs, WORDNET, LSA techniques, and domain-specific controlled vocabulary. To evaluate the practical benefits, the prototype was used to query MEDLINE database, and to locate domain-specific controlled vocabulary terms in Materials and Process Specifications. Its extensibility has been demonstrated by incorporating a spell-checker for the input query, and by structuring the retrieved output into hierarchical collections for quicker assimilation. It is also being used to experimentally explore the relationship between LSA and document clustering using 20-mini-newsgroups and …