Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 16 of 16

Full-Text Articles in Physical Sciences and Mathematics

Vdl: A Language For Active Mining Variants Of Association Rules, Kok-Leong Ong, Wee-Keong Ng, Ee Peng Lim Dec 2002

Vdl: A Language For Active Mining Variants Of Association Rules, Kok-Leong Ong, Wee-Keong Ng, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The popularity of association rules has resulted in several variations being proposed. In each case, additional attributes in the data are considered so as to produce more informative rules. In the context of active mining, different types of rules may be required over a period of time due to knowledge needs or the availability of new attributes. The present approach is the ad-hoc development of algorithms for each variant of rules. This is time consuming and costly, and is a stumping block to the vision of active mining. We argue that knowledge needs and the changing characteristics of the data …


Intergenerational Partnerships In The Design Of A Digital Library Of Geography Examination Resources, Yin-Leng Theng, Dion Hoe-Lian Goh, Ee Peng Lim, Zehua Liu, Natalie Lee-San Pang, Patricia Bao-Bao Wong, Lian-Heong Chua Dec 2002

Intergenerational Partnerships In The Design Of A Digital Library Of Geography Examination Resources, Yin-Leng Theng, Dion Hoe-Lian Goh, Ee Peng Lim, Zehua Liu, Natalie Lee-San Pang, Patricia Bao-Bao Wong, Lian-Heong Chua

Research Collection School Of Computing and Information Systems

This paper describes the engagement of intergenerational partners in the design of a digital library of geographical resources (GeogDL) to help prepare Singapore students for a national examination in geography. GeogDL is built on top of G-Portal, a digital library providing services over geospatial and georeferenced Web content. Scenario-based design and claims analysis were employed as a means of refinement to the initial design of the GeogDL prototype.


Resource Annotation Framework In A Georeferenced And Geospatial Digital Library, Zehua Liu, Ee Peng Lim, Dion Hoe-Lian Goh Dec 2002

Resource Annotation Framework In A Georeferenced And Geospatial Digital Library, Zehua Liu, Ee Peng Lim, Dion Hoe-Lian Goh

Research Collection School Of Computing and Information Systems

G-Portal is a georeferenced and geospatial digital library that aims to identify, classify and organize geospatial and georeferenced resources on the web and to provide digital library services for these resources. Annotation service is supported in G-Portal to enable users to contribute content to the digital library. In this paper, we present a resource annotation framework for georeferenced and geospatial digital libraries and discuss its application in G-Portal. The framework is fiexible for managing annotations of heterogeneous web resources. It allows users to contribute not only the annotation content but also the schema of the annotations. Meanwhile, other digital library …


A Data Mining Approach To Library New Book Recommendations, San-Yih Hwang, Ee Peng Lim Dec 2002

A Data Mining Approach To Library New Book Recommendations, San-Yih Hwang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In this paper, we propose a data mining approach to recommending new library books that have never been rated or borrowed by users. In our problem context, users are characterized by their demographic attributes, and concept hierarchies can be defined for some of these demographic attributes. Books are assigned to the base categories of a taxonomy. Our goal is therefore to identify the type of users interested in some specific type of books. We call such knowledge generalized profile association rules. In this paper, we propose a new definition of rule interestingness to prune away rules that are redundant and …


Mining Of Correlated Rules In Genome Sequences, L. Lin, L. Wong, Tze-Yun Leong, P. S. Lai Nov 2002

Mining Of Correlated Rules In Genome Sequences, L. Lin, L. Wong, Tze-Yun Leong, P. S. Lai

Research Collection School Of Computing and Information Systems

With the huge amount of data collected by scientists in the molecular genetics community in recent years, there exists a need to develop some novel algorithms based on existing data mining techniques to discover useful information from genome databases. We propose an algorithm that integrates the statistical method, association rule mining, and classification rule mining in the discovery of allelic combinations of genes that are peculiar to certain phenotypes of diseased patients.


A Visual Tool For Building Logical Data Models Of Websites, Zehua Liu, Wee-Keong Ng, Feifei Li, Ee Peng Lim Nov 2002

A Visual Tool For Building Logical Data Models Of Websites, Zehua Liu, Wee-Keong Ng, Feifei Li, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Information sources over the WWW contain a large amount of data organized according to different interests and values. Thus, it is important that facilities are there to enable users to extract information of interest in a simple and effective manner. To do this, We propose the Wiccap Data Model, an XML data model that maps Web information sources into commonly perceived logical models, so that information can be extracted automatically according to users' interests. To accelerate the creation of data models, we have implemented a visual tool, called the Mapping Wizard, to facilitate and automate the process of producing Wiccap …


Web Classification Using Support Vector Machine, Aixin Sun, Ee Peng Lim Nov 2002

Web Classification Using Support Vector Machine, Aixin Sun, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In web classification, web pages from one or more web sites are assigned to pre-defined categories according to their content. Since web pages are more than just plain text documents, web classification methods have to consider using other context features of web pages, such as hyperlinks and HTML tags. In this paper, we propose the use of Support Vector Machine (SVM) classifiers to classify web pages using both their text and context feature sets. We have experimented our web classification method on the WebKB data set. Compared with earlier Foil-Pilfs method on the same data set, our method has been …


Cache Invalidation And Replacement Strategies For Location-Dependent Data In Mobile Environments, Baihua Zheng, Jianliang Xu, Dik Lun Lee Oct 2002

Cache Invalidation And Replacement Strategies For Location-Dependent Data In Mobile Environments, Baihua Zheng, Jianliang Xu, Dik Lun Lee

Research Collection School Of Computing and Information Systems

Mobile location-dependent information services (LDISs) have become increasingly popular in recent years. However, data caching strategies for LDISs have thus far received little attention. In this paper, we study the issues of cache invalidation and cache replacement for location-dependent data under a geometric location model. We introduce a new performance criterion, called caching efficiency, and propose a generic method for location-dependent cache invalidation strategies. In addition, two cache replacement policies, PA and PAID, are proposed. Unlike the conventional replacement policies, PA and PAID take into consideration the valid scope area of a data value. We conduct a series of simulation …


Personalized Classification For Keyword-Based Category Profiles, Aixin Sun, Ee Peng Lim, Wee-Keong Ng Sep 2002

Personalized Classification For Keyword-Based Category Profiles, Aixin Sun, Ee Peng Lim, Wee-Keong Ng

Research Collection School Of Computing and Information Systems

Personalized classification refers to allowing users to define their own categories and automating the assignment of documents to these categories. In this paper, we examine the use of keywords to define personalized categories and propose the use of Support Vector Machine (SVM) to perform personalized classification. Two scenarios have been investigated. The first assumes that the personalized categories are defined in a flat category space. The second assumes that each personalized category is defined within a pre-defined general category that provides a more specific context for the personalized category. The training documents for personalized categories are obtained from a training …


Hcl: A Specification Language For Hierarchical Text Classification, Aixin Sun, Ee Peng Lim, Wee-Keong Ng Aug 2002

Hcl: A Specification Language For Hierarchical Text Classification, Aixin Sun, Ee Peng Lim, Wee-Keong Ng

Research Collection School Of Computing and Information Systems

Hierarchical text classification refers to assigning text documents to the categories in a given category tree based on their content. With large number of categories organized as a tree, hierarchical text classification helps users to find information more quickly and accurately. Nevertheless, hierarchical text classification methods in the past have often been constructed in a proprietary manner. The construction steps often involve human efforts and are not completely automated. In this paper, we therefore propose a specification language known as HCL (Hierarchical Classification Language) . HCL is designed to describe a hierarchical classification method including the definition of a category …


Digital Libraries To Knowledge Portals: Towards A Global Knowledge Portal For Secondary Schools In Singapore, Yin-Leng Theng, Dion Hoe-Lian Goh, Chu Keong Lee, Ee Peng Lim, Zehua Liu Jul 2002

Digital Libraries To Knowledge Portals: Towards A Global Knowledge Portal For Secondary Schools In Singapore, Yin-Leng Theng, Dion Hoe-Lian Goh, Chu Keong Lee, Ee Peng Lim, Zehua Liu

Research Collection School Of Computing and Information Systems

For digital libraries to remain relevant in the new millennium where the ability to manage knowledge is critical, this paper explores how digital libraries could strategically be evolved into knowledge portals to encapsulate knowledge creation, management, sharing and reusability, features evidently lacking in most conventional digital libraries. Two digital library scenarios of use in education are described and implemented as knowledge portals using G-Portal and the Greenstone software. We hope that the initial work carried out on these two portal-like DLs will eventually form part of a Global Knowledge Portal for Secondary Schools in Singapore. Keywords Digital libraries, information portals, …


Fast Filter-And-Refine Algorithms For Subsequence Selection, Beng-Chin Ooi, Hwee Hwa Pang, Hao Wang, Limsoon Wong, Cui Yu Jul 2002

Fast Filter-And-Refine Algorithms For Subsequence Selection, Beng-Chin Ooi, Hwee Hwa Pang, Hao Wang, Limsoon Wong, Cui Yu

Research Collection School Of Computing and Information Systems

Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the …


Product Schema Integration For Electronic Commerce: A Synonym Comparison Approach, Guanghao Yan, Wee-Keong Ng, Ee Peng Lim May 2002

Product Schema Integration For Electronic Commerce: A Synonym Comparison Approach, Guanghao Yan, Wee-Keong Ng, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In any electronic commerce system, the heterogeneity of product descriptions is a critical impediment to efficient business information exchange. In the ABECOS electronic commerce system, buyer agents, seller agents, and directory agents liaise with one another in e-commerce activities. Only when agents have a common ontology of product descriptions (also called product schemas) are they able to interact seamlessly in e-commerce activities. This gives rise to the product schema integration problem (PSI); the problem of integrating heterogeneous schemas of a certain product into one globally compatible schema. We adopt an integration approach based on product attribute synonyms. We give a …


A Case For Analytical Customer Relationship Management, Jaideep Srivastava, Jau-Hwang Wang, Ee Peng Lim, San-Yih Hwang May 2002

A Case For Analytical Customer Relationship Management, Jaideep Srivastava, Jau-Hwang Wang, Ee Peng Lim, San-Yih Hwang

Research Collection School Of Computing and Information Systems

This paper describes how data analytics can be used to make various CRM functions like customer segmentation, communication targeting, retention, and loyalty much more effective. Also briefly describe the key technologies needed to implement analytical CRM, and are the organizational issues that must be carefully handled to make CRM a reality.


Mining Relationship Graphs For Effective Business Objectives, Kok-Leong Ong, Ee Peng Lim, Wee-Keong Ng May 2002

Mining Relationship Graphs For Effective Business Objectives, Kok-Leong Ong, Ee Peng Lim, Wee-Keong Ng

Research Collection School Of Computing and Information Systems

Modern organization has two types of customer profiles: active and passive. Active customers contribute to the business goals of an organization, while passive customers are potential candidates that can be converted to active ones. Existing KDD techniques focused mainly on past data generated by active customers. The insights discovered apply well to active ones but may scale poorly with passive customers. This is because there is no attempt to generate know-how to convert passive customers into active ones. We propose an algorithm to discover relationship graphs using both types of profile. Using relationship graphs, an organization can be more effective …


An Intelligent Middleware For Linear Correlation Discovery, Cecil Chua, Roger Hsiang-Li Chiang, Ee Peng Lim Mar 2002

An Intelligent Middleware For Linear Correlation Discovery, Cecil Chua, Roger Hsiang-Li Chiang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Although it is widely accepted that research from data mining, knowledge discovery, and data warehousing should be synthesized, little research addresses the integration of existing data management and analysis software. We develop an intelligent middleware that facilitates linear correlation discovery, the discovery of associations between attributes and attribute groups. This middleware integrates data management and data analysis tools to improve traditional data analysis in three perspectives: (1) identify appropriate linear correlation functions to perform based on the semantics of a data set; (2) execute appropriate functions contained in the data analysis packages; and (3) derive useful knowledge from data analysis.