Open Access. Powered by Scholars. Published by Universities.®
- Discipline
Articles 1 - 3 of 3
Full-Text Articles in Computer Sciences
High Utility Itemsets Identification In Big Data, Ashish Tamrakar
High Utility Itemsets Identification In Big Data, Ashish Tamrakar
UNLV Theses, Dissertations, Professional Papers, and Capstones
High utility itemset mining is an important data mining problem which considers profit factors besides quantity from the transactional database. It helps find the most valuable products/items that are difficult to track using only the frequent data mining set. An item that has a high-profit value might be rare in the transactional database despite its tremendous importance. While there are many existing algorithms which generate comparatively large candidate sets while finding high utility itemsets, the major focus is to reduce the computational time significantly with the introduction of pruning strategies. Another aspect of high utility itemset mining is to compute …
Enhancing The Draft Assembly With Minhash, Saju Varghese
Enhancing The Draft Assembly With Minhash, Saju Varghese
UNLV Theses, Dissertations, Professional Papers, and Capstones
In this thesis, we report on the use of minhash techniques to improve the draft assembly of a genome mapping. More specifically, we use minhash to compare the scaffolds of sea urchin and sea cucumber genomes.
One of the main contributions of this thesis is the implementation of minhash with the Message Passing Interface (MPI) utilizing Intel Phi co-processors. It is shown that our implementation significantly reduces the processing time for identification of k-mer similarities.
Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni
Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni
UNLV Theses, Dissertations, Professional Papers, and Capstones
Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.
For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …