Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Hierarchical Text Classification And Evaluation, Aixin Sun, Ee Peng Lim Nov 2001

Hierarchical Text Classification And Evaluation, Aixin Sun, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose atop-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar or not far from the correct ones in the category tree. We therefore propose the Category-Similarity Measures and Distance-Based Measures to consider the …


Mining Multi-Level Rules With Recurrent Items Using Fp'-Tree, Kok-Leong Ong, Wee-Keong Ng, Ee Peng Lim Oct 2001

Mining Multi-Level Rules With Recurrent Items Using Fp'-Tree, Kok-Leong Ong, Wee-Keong Ng, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Association rule mining has received broad research in the academic and wide application in the real world. As a result, many variations exist and one such variant is the mining of multi-level rules. The mining of multi-level rules has proved to be useful in discovering important knowledge that conventional algorithms such as Apriori, SETM, DIC etc., miss. However, existing techniques for mining multi-level rules have failed to take into account the recurrence relationship that can occur in a transaction during the translation of an atomic item to a higher level representation. As a result, rules containing recurrent items go unnoticed. …


Vide: A Visual Data Extraction Environment For The Web, Yi Li, Wee-Keong Ng, Ee Peng Lim Sep 2001

Vide: A Visual Data Extraction Environment For The Web, Yi Li, Wee-Keong Ng, Ee Peng Lim

Research Collection School Of Computing and Information Systems

With the rapid growth of information on the Web, a means to combat information overload is critical. In this paper, we present ViDE (Visual Data Extraction), an interactive web data extraction environment that supports efficient hierarchical data wrapping of multiple web pages. ViDE has two unique features that differentiate it from other extraction mechanisms. First, data extraction rules can be easily specified in a graphical user interface that is seamlessly integrated with a web browser. Second, ViDE introduces the concept of grouping which unites the extraction rules for a set of documents with the navigational patterns that exist among them. …


Predictive Self-Organizing Networks For Text Categorization, Ah-Hwee Tan Apr 2001

Predictive Self-Organizing Networks For Text Categorization, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

This paper introduces a class of predictive self-organizing neural networks known as Adaptive Resonance Associative Map (ARAM) for classification of free-text documents. Whereas most sta- tistical approaches to text categorization derive classification knowledge based on training examples alone, ARAM performs supervised learn- ing and integrates user-defined classification knowledge in the form of IF-THEN rules. Through our experiments on the Reuters-21578 news database, we showed that ARAM performed reasonably well in mining categorization knowledge from sparse and high dimensional document feature space. In addition, ARAM predictive accuracy and learning efficiency can be improved by incorporating a set of rules derived from …


Topic Detection, Tracking, And Trend Analysis Using Self-Organizing Neural Networks, Kanagasabai Rajaraman, Ah-Hwee Tan Apr 2001

Topic Detection, Tracking, And Trend Analysis Using Self-Organizing Neural Networks, Kanagasabai Rajaraman, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

We address the problem of Topic Detection and Tracking (TDT) and subsequently detecting trends from a stream of text documents. Formulating TDT as a clustering problem in a class of self-organizing neural networks, we propose an incremental clustering algorithm. On this setup we show how trends can be identified. Through experimental studies, we observe that our method enables discovering interesting trends that are deducible only from reading all relevant documents.


Incorporating Window-Based Passage-Level Evidence In Document Retrieval, Wensi Xi, Richard Xu-Rong, Christopher Soo Guan Khoo, Ee Peng Lim Jan 2001

Incorporating Window-Based Passage-Level Evidence In Document Retrieval, Wensi Xi, Richard Xu-Rong, Christopher Soo Guan Khoo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passages extracted and the highest score obtained by a passage of that size was taken as the document’s passage-level score for that window size. A range of window sizes was …