Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

UNLV Theses, Dissertations, Professional Papers, and Capstones

Computer algorithms

Social and Behavioral Sciences

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Using The Web 1t 5-Gram Database For Attribute Selection In Formal Concept Analysis To Correct Overstemmed Clusters, Guymon Hall May 2014

Using The Web 1t 5-Gram Database For Attribute Selection In Formal Concept Analysis To Correct Overstemmed Clusters, Guymon Hall

UNLV Theses, Dissertations, Professional Papers, and Capstones

Information retrieval is the process of finding information from an unstructured collection of data. The process of information retrieval involves building an index, commonly called an inverted file. As part of the inverted file, information retrieval algorithms often stem words to a common root. Stemming involves reducing a document term to its root. There are many ways to stem a word: affix removal and successor variety are two common categories of stemmers. The Porter Stemming Algorithm is a suffix removal stemmer that operates as a rule-based process on English words. We can think of stemming as a way to cluster …