Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

UNLV Theses, Dissertations, Professional Papers, and Capstones

Articles 1 - 3 of 3

Full-Text Articles in Computer Sciences

High Utility Itemsets Identification In Big Data, Ashish Tamrakar May 2017

High Utility Itemsets Identification In Big Data, Ashish Tamrakar

UNLV Theses, Dissertations, Professional Papers, and Capstones

High utility itemset mining is an important data mining problem which considers profit factors besides quantity from the transactional database. It helps find the most valuable products/items that are difficult to track using only the frequent data mining set. An item that has a high-profit value might be rare in the transactional database despite its tremendous importance. While there are many existing algorithms which generate comparatively large candidate sets while finding high utility itemsets, the major focus is to reduce the computational time significantly with the introduction of pruning strategies. Another aspect of high utility itemset mining is to compute …


Enhancing The Draft Assembly With Minhash, Saju Varghese Dec 2016

Enhancing The Draft Assembly With Minhash, Saju Varghese

UNLV Theses, Dissertations, Professional Papers, and Capstones

In this thesis, we report on the use of minhash techniques to improve the draft assembly of a genome mapping. More specifically, we use minhash to compare the scaffolds of sea urchin and sea cucumber genomes.

One of the main contributions of this thesis is the implementation of minhash with the Message Passing Interface (MPI) utilizing Intel Phi co-processors. It is shown that our implementation significantly reduces the processing time for identification of k-mer similarities.


Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni Jan 2009

Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni

UNLV Theses, Dissertations, Professional Papers, and Capstones

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.

For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …