Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

2007

Bisecting K-means

Articles 1 - 1 of 1

Full-Text Articles in Computer Engineering

High Performance Text Document Clustering, Yanjun Li Jan 2007

High Performance Text Document Clustering, Yanjun Li

Browse all Theses and Dissertations

Data mining, also known as knowledge discovery in database (KDD), is the process to discover interesting unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract interesting and nontrivial information and knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups.

This research focuses on improving the performance of text clustering. We investigated the text clustering algorithms in four aspects: document representation, documents closeness measurement, high dimension reduction and parallelization. We propose a group of high performance …