Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 24 of 24

Full-Text Articles in Databases and Information Systems

Measuring Qualities Of Articles Contributed By Online Communities, Ee Peng Lim, Ba-Quy Vuong, Hady W. Lauw, Aixin Sun Dec 2006

Measuring Qualities Of Articles Contributed By Online Communities, Ee Peng Lim, Ba-Quy Vuong, Hady W. Lauw, Aixin Sun

Research Collection School Of Computing and Information Systems

Using open source Web editing software (e.g., wiki), online community users can now easily edit, review and publish articles collaboratively. While much useful knowledge can be derived from these articles, content users and critics are often concerned about their qualities. In this paper, we develop two models, namely basic model and peer review model, for measuring the qualities of these articles and the authorities of their contributors. We represent collaboratively edited articles and their contributors in a bipartite graph. While the basic model measures an article's quality using both the authorities of contributors and the amount of contribution from each …


Continuous Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim Dec 2006

Continuous Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In these applications, continuous query processing is often required and their efficient evaluation is a critical requirement to be met. Due to the limited power supply for sensor nodes, energy efficiency is a major performance measure in such query evaluation. In this paper, we focus on continuous kNN query processing. We observe that the centralized data storage and monitoring schemes do not favor energy efficiency. We therefore propose a localized scheme to monitor long running nearest neighbor queries in sensor networks. …


Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Composition, Jialie Shen, John Shepherd, Ngu Ahh Dec 2006

Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Composition, Jialie Shen, John Shepherd, Ngu Ahh

Research Collection School Of Computing and Information Systems

In this paper, we present a new approach to constructing music descriptors to support efficient content-based music retrieval and classification. The system applies multiple musical properties combined with a hybrid architecture based on principal component analysis (PCA) and a multilayer perceptron neural network. This architecture enables straightforward incorporation of multiple musical feature vectors, based on properties such as timbral texture, pitch, and rhythm structure, into a single low-dimensioned vector that is more effective for classification than the larger individual feature vectors. The use of supervised training enables incorporation of human musical perception that further enhances the classification process. We compare …


Query-Based Watermarking For Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan Dec 2006

Query-Based Watermarking For Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

As increasing amount of XML data is exchanged over the internet, copyright protection of this type of data is becoming an important requirement for many applications. In this paper, we introduce a rights protection scheme for XML data based on digital watermarking. One of the main challenges for watermarking XML data is that the data could be easily reorganized by an adversary in an attempt to destroy any embedded watermark. To overcome it, we propose a query-based watermarking scheme, which creates queries to identify available watermarking capacity, such that watermarks could be recovered from reorganized data through query rewriting. The …


Understanding User Perceptions On Usefulness And Usability Of An Integrated Wiki-G-Portal, Yin-Leng Theng, Yuanyuan Li, Ee Peng Lim, Zhe Wang, Dion Hoe-Lian Goh, Chew-Hung Chang, Kalyani Chatterjea, Jun Zhang Nov 2006

Understanding User Perceptions On Usefulness And Usability Of An Integrated Wiki-G-Portal, Yin-Leng Theng, Yuanyuan Li, Ee Peng Lim, Zhe Wang, Dion Hoe-Lian Goh, Chew-Hung Chang, Kalyani Chatterjea, Jun Zhang

Research Collection School Of Computing and Information Systems

This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use.


A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim Nov 2006

A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Event detection is a very important area of research that discovers new events reported in a stream of text documents. Previous research in event detection has largely focused on finding the first story and tracking the events of a specific topic. A topic is simply a set of related events defined by user supplied keywords with no associated semantics and little domain knowledge. We therefore introduce the Anticipatory Event Detection (AED) problem: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic. We confine the events to …


Integration Of Wikipedia And A Geography Digital Library, Ee Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang, Kalyani Chatterjea, Dion Hoe-Lian Goh, Yin-Leng Theng, Jun Zhang, Aixin Sun Nov 2006

Integration Of Wikipedia And A Geography Digital Library, Ee Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang, Kalyani Chatterjea, Dion Hoe-Lian Goh, Yin-Leng Theng, Jun Zhang, Aixin Sun

Research Collection School Of Computing and Information Systems

In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal and Wikipedia …


Extracting Link Chains Of Relationship Instances From A Website, Myo-Myo Naing, Ee Peng Lim, Roger Hsiang-Li Chiang Oct 2006

Extracting Link Chains Of Relationship Instances From A Website, Myo-Myo Naing, Ee Peng Lim, Roger Hsiang-Li Chiang

Research Collection School Of Computing and Information Systems

Web pages from a Web site can often be associated with concepts in an ontology, and pairs of Web pages also can be associated with relationships between concepts. With such associations, the Web site can be searched, browsed, or even reorganized based on the concept and relationship labels of its Web pages. In this article, we study the link chain extraction problem that is critical to the extraction of Web pages that are related. A link chain is an ordered list of anchor elements linking two Web pages related by some semantic relationship. We propose a link chain extraction method …


Masking Page Reference Patterns In Encryption Databases On Untrusted Storage, Xi Ma, Hwee Hwa Pang, Kian-Lee Tan Sep 2006

Masking Page Reference Patterns In Encryption Databases On Untrusted Storage, Xi Ma, Hwee Hwa Pang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

To support ubiquitous computing, the underlying data have to be persistent and available anywhere-anytime. The data thus have to migrate from devices that are local to individual computers, to shared storage volumes that are accessible over open network. This potentially exposes the data to heightened security risks. In particular, the activity on a database exhibits regular page reference patterns that could help attackers learn logical links among physical pages and then launch additional attacks. We propose two countermeasures to mitigate the risk of attacks initiated through analyzing the shared storage server’s activity for those page patterns. The first countermeasure relocates …


Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis Sep 2006

Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis

Research Collection School Of Computing and Information Systems

Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as °uctuations of edge weights. The ¯rst one maintains the query results by processing only updates that may invalidate …


Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang Aug 2006

Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang

Research Collection School Of Computing and Information Systems

In this paper, we investigate how deviation in evaluation activities may reveal bias on the part of reviewers and controversy on the part of evaluated objects. We focus on a 'data-centric approach' where the evaluation data is assumed to represent the ground truth'. The standard statistical approaches take evaluation and deviation at face value. We argue that attention should be paid to the subjectivity of evaluation, judging the evaluation score not just on 'what is being said' (deviation), but also on 'who says it' (reviewer) as well as on 'whom it is said about' (object). Furthermore, we observe that bias …


An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun Aug 2006

An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun

Research Collection School Of Computing and Information Systems

Data broadcast is an attractive data dissemination method in mobile environments. To improve energy efficiency, existing air indexing schemes for data broadcast have focused on reducing tuning time only, i.e., the duration that a mobile client stays active in data accesses. On the other hand, existing broadcast scheduling schemes have aimed at reducing access latency through nonflat data broadcast to improve responsiveness only. Not much work has addressed the energy efficiency and responsiveness issues concurrently. This paper proposes an energy-efficient indexing scheme called MHash that optimizes tuning time and access latency in an integrated fashion. MHash reduces tuning time by …


Extraction Of Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai Jul 2006

Extraction Of Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai

Research Collection School Of Computing and Information Systems

In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage …


Authenticating Multi-Dimensional Query Results In Data Publishing, Weiwei Cheng, Hwee Hwa Pang, Kian-Lee Tan Jul 2006

Authenticating Multi-Dimensional Query Results In Data Publishing, Weiwei Cheng, Hwee Hwa Pang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the publisher may be untrusted or susceptible to attacks, it could produce incorrect query results. This paper introduces a mechanism for users to verify that their query answers on a multi-dimensional dataset are correct, in the sense of being complete (i.e., no qualifying data points are omitted) and authentic (i.e., all the result values originated from the owner). Our approach is to add authentication information into a spatial data structure, by constructing certified chains on the points within each partition, as well as …


Exploiting Domain Structure For Named Entity Recognition, Jing Jiang, Chengxiang Zhai Jun 2006

Exploiting Domain Structure For Named Entity Recognition, Jing Jiang, Chengxiang Zhai

Research Collection School Of Computing and Information Systems

Named Entity Recognition (NER) is a fundamental task in text mining and natural language understanding. Current approaches to NER (mostly based on supervised learning) perform well on domains similar to the training domain, but they tend to adapt poorly to slightly different domains. We present several strategies for exploiting the domain structure in the training data to learn a more robust named entity recognizer that can perform well on a new domain. First, we propose a simple yet effective way to automatically rank features based on their generalizabilities across domains. We then train a classifier with strong emphasis on the …


On In-Network Synopsis Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang May 2006

On In-Network Synopsis Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang

Research Collection School Of Computing and Information Systems

The emergence of sensor networks enables applications that deploy sensors to collaboratively monitor environment and process data collected. In some scenarios, we are interested in using join queries to correlate data stored in different regions of a sensor network, where the data volume is large, making it prohibitive to transmit all data to a central server for joining. In this paper, we present an in-network synopsis join strategy for evaluating join queries in sensor networks with communication efficiency. In this strategy, we prune data that do not contribute to the join results in the early stage of the join processing, …


Discovering Causal Dependencies In Mobile Context-Aware Recommenders, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang May 2006

Discovering Causal Dependencies In Mobile Context-Aware Recommenders, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

Mobile context-aware recommender systems face unique challenges in acquiring context. Resource limitations make minimizing context acquisition a practical need, while the uncertainty inherent to the mobile environment makes missing context values a major concern. This paper introduces a scalable mechanism based on Bayesian network learning in a tiered context model to overcome both of these challenges. Extensive experiments on a restaurant recommender system showed that our mechanism can accurately discover causal dependencies among context, thereby enabling the effective identification of the minimal set of important context for a specific user and task, as well as providing highly accurate recommendations even …


Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan Apr 2006

Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan

Research Collection School Of Computing and Information Systems

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning …


Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim Apr 2006

Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Mobile user data mining is a field that focuses on extracting interesting pattern and knowledge out from data generated by mobile users. Group pattern is a type of mobile user data mining method. In group pattern mining, group patterns from a given user movement database is found based on spatio-temporal distances. In this paper, we propose an improvement of efficiency using area method for locating mobile users and using sliding window for static group pattern mining. This reduces the complexity of valid group pattern mining problem. We support the use of static method, which uses areas and sliding windows instead …


In-Network Processing Of Nearest Neigbor Queries For Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim Apr 2006

In-Network Processing Of Nearest Neigbor Queries For Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. The sensor nodes in the network have the abilities to sense, store, compute and communicate. To enable object tracking applications, spatial queries such as nearest neighbor queries are to be supported in these networks. The queries can be injected by the user at any sensor node. Due to the limited power supply for sensor nodes, energy efficiency is the major concern in query processing. Centralized data storage and query processing schemes do not favor energy efficiency. In this paper, we propose …


Searching Substructures With Superimposed Distance, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu Apr 2006

Searching Substructures With Superimposed Distance, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu

Research Collection School Of Computing and Information Systems

Efficient indexing techniques have been developed for the exact and approximate substructure search in large scale graph databases. Unfortunately, the retrieval problem of structures with categorical or geometric distance constraints is not solved yet. In this paper, we develop a method called PIS (Partition-based Graph Index and Search) to support similarity search on substructures with superimposed distance constraints. PIS selects discriminative fragments in a query graph and uses an index to prune the graphs that violate the distance constraints. We identify a criterion to distinguish the selectivity of fragments in multiple graphs and develop a partition method to obtain a …


Grid-Partition Index: A Hybrid Approach To Nearest-Neighbor Queries In Wireless Location-Based Services, Baihua Zheng, Jianliang Xu, Wang-Chien Lee, Dik Lun Lee Jan 2006

Grid-Partition Index: A Hybrid Approach To Nearest-Neighbor Queries In Wireless Location-Based Services, Baihua Zheng, Jianliang Xu, Wang-Chien Lee, Dik Lun Lee

Research Collection School Of Computing and Information Systems

Traditional nearest-neighbor (NN) search is based on two basic indexing approaches: object-based indexing and solution-based indexing. The former is constructed based on the locations of data objects: using some distance heuristics on object locations. The latter is built on a precomputed solution space. Thus, NN queries can be reduced to and processed as simple point queries in this solution space. Both approaches exhibit some disadvantages, especially when employed for wireless data broadcast in mobile computing environments. In this paper, we introduce a new index method, called the grid-partition index, to support NN search in both ondemand access and periodic broadcast …


In-Network Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang Jan 2006

In-Network Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang

Research Collection School Of Computing and Information Systems

Recent advances in hardware and wireless technologies have led to sensor networks consisting of large number of sensors capable of gathering and processing data collectively. Query processing on these sensor networks has to consider various inherent constraints. While simple queries such as select and aggregate queries in wireless sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. In this paper, we present a synopsis join strategy for evaluating join queries in sensor networks with communication efficiency. In this strategy, instead of directly joining two relations distributed in a sensor …


Efficient Mining Of Group Patterns From User Movement Data, Yida Wang, Ee Peng Lim, San-Yih Hwang Jan 2006

Efficient Mining Of Group Patterns From User Movement Data, Yida Wang, Ee Peng Lim, San-Yih Hwang

Research Collection School Of Computing and Information Systems

In this paper, we present a new approach to derive groupings of mobile users based on their movement data. We assume that the user movement data are collected by logging location data emitted from mobile devices tracking users. We formally define group pattern as a group of users that are within a distance threshold from one another for at least a minimum duration. To mine group patterns, we first propose two algorithms, namely AGP and VG-growth. In our first set of experiments, it is shown when both the number of users and logging duration are large, AGP and VG-growth are …