Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Canberra distances (1)
- Chi-Square (1)
- Clustering (1)
- DBSCAN (1)
- Data mining (1)
-
- Datasets (1)
- Distances (1)
- Document clustering (1)
- Euclidean distances (1)
- Execution time (1)
- Information retrieval (1)
- Information retrieval; Internet searching; Keyword searching; Vector spaces--Data processing (1)
- K-means clustering algorithm (1)
- Large datasets (1)
- Memory availability (1)
- Similarity functions (1)
- Tree-based data structures (1)
Articles 1 - 3 of 3
Full-Text Articles in Physical Sciences and Mathematics
A Study Of Relevance Feedback In Vector Space Model, Deepthi Katta
A Study Of Relevance Feedback In Vector Space Model, Deepthi Katta
UNLV Theses, Dissertations, Professional Papers, and Capstones
Information Retrieval is the science of searching for information or documents based on information need from a huge set of documents. It has been an active field of research since early 19th century and different models of retrieval came in to existence to cater the information need.
This thesis starts with understanding some of the basic information retrieval models, followed by implementation of one of the most popular statistical retrieval model known as Vector Space Model. This model ranks the documents in the collection based on the similarity measure calculated between the query and the respective document. The user …
Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni
Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni
UNLV Theses, Dissertations, Professional Papers, and Capstones
Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.
For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …
Efficient Clustering Techniques For Managing Large Datasets, Vasanth Nemala
Efficient Clustering Techniques For Managing Large Datasets, Vasanth Nemala
UNLV Theses, Dissertations, Professional Papers, and Capstones
The result set produced by a search engine in response to the user query is very large. It is typically the responsibility of the user to browse the result set to identify relevant documents. Many tools have been developed to assist the user to identify the most relevant documents. One such a tool is clustering technique. In this method, the closely related documents are grouped based on their contents. Hence if a document turns out to be relevant, so are the rest of the documents in the cluster. So it would be easy for a user to sift through the …