Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 28 of 28

Full-Text Articles in Databases and Information Systems

Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Composition, Jialie Shen, John Shepherd, Ngu Ahh Dec 2006

Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Composition, Jialie Shen, John Shepherd, Ngu Ahh

Research Collection School Of Computing and Information Systems

In this paper, we present a new approach to constructing music descriptors to support efficient content-based music retrieval and classification. The system applies multiple musical properties combined with a hybrid architecture based on principal component analysis (PCA) and a multilayer perceptron neural network. This architecture enables straightforward incorporation of multiple musical feature vectors, based on properties such as timbral texture, pitch, and rhythm structure, into a single low-dimensioned vector that is more effective for classification than the larger individual feature vectors. The use of supervised training enables incorporation of human musical perception that further enhances the classification process. We compare …


Query-Based Watermarking For Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan Dec 2006

Query-Based Watermarking For Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

As increasing amount of XML data is exchanged over the internet, copyright protection of this type of data is becoming an important requirement for many applications. In this paper, we introduce a rights protection scheme for XML data based on digital watermarking. One of the main challenges for watermarking XML data is that the data could be easily reorganized by an adversary in an attempt to destroy any embedded watermark. To overcome it, we propose a query-based watermarking scheme, which creates queries to identify available watermarking capacity, such that watermarks could be recovered from reorganized data through query rewriting. The …


Continuous Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim Dec 2006

Continuous Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. In these applications, continuous query processing is often required and their efficient evaluation is a critical requirement to be met. Due to the limited power supply for sensor nodes, energy efficiency is a major performance measure in such query evaluation. In this paper, we focus on continuous kNN query processing. We observe that the centralized data storage and monitoring schemes do not favor energy efficiency. We therefore propose a localized scheme to monitor long running nearest neighbor queries in sensor networks. …


Measuring Qualities Of Articles Contributed By Online Communities, Ee Peng Lim, Ba-Quy Vuong, Hady W. Lauw, Aixin Sun Dec 2006

Measuring Qualities Of Articles Contributed By Online Communities, Ee Peng Lim, Ba-Quy Vuong, Hady W. Lauw, Aixin Sun

Research Collection School Of Computing and Information Systems

Using open source Web editing software (e.g., wiki), online community users can now easily edit, review and publish articles collaboratively. While much useful knowledge can be derived from these articles, content users and critics are often concerned about their qualities. In this paper, we develop two models, namely basic model and peer review model, for measuring the qualities of these articles and the authorities of their contributors. We represent collaboratively edited articles and their contributors in a bipartite graph. While the basic model measures an article's quality using both the authorities of contributors and the amount of contribution from each …


Modeling Heterogeneous User Churn And Local Resilience Of Unstructured P2p Networks, Zhongmei Yao, Derek Leonard, Dmitri Loguinov, Xiaoming Wang Nov 2006

Modeling Heterogeneous User Churn And Local Resilience Of Unstructured P2p Networks, Zhongmei Yao, Derek Leonard, Dmitri Loguinov, Xiaoming Wang

Computer Science Faculty Publications

Previous analytical results on the resilience of unstructured P2P systems have not explicitly modeled heterogeneity of user churn (i.e., difference in online behavior) or the impact of in-degree on system resilience. To overcome these limitations, we introduce a generic model of heterogeneous user churn, derive the distribution of the various metrics observed in prior experimental studies (e.g., lifetime distribution of joining users, joint distribution of session time of alive peers, and residual lifetime of a randomly selected user), derive several closed-form results on the transient behavior of in-degree, and eventually obtain the joint in/out degree isolation probability as a simple …


Integration Of Wikipedia And A Geography Digital Library, Ee Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang, Kalyani Chatterjea, Dion Hoe-Lian Goh, Yin-Leng Theng, Jun Zhang, Aixin Sun Nov 2006

Integration Of Wikipedia And A Geography Digital Library, Ee Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang, Kalyani Chatterjea, Dion Hoe-Lian Goh, Yin-Leng Theng, Jun Zhang, Aixin Sun

Research Collection School Of Computing and Information Systems

In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal and Wikipedia …


Understanding User Perceptions On Usefulness And Usability Of An Integrated Wiki-G-Portal, Yin-Leng Theng, Yuanyuan Li, Ee Peng Lim, Zhe Wang, Dion Hoe-Lian Goh, Chew-Hung Chang, Kalyani Chatterjea, Jun Zhang Nov 2006

Understanding User Perceptions On Usefulness And Usability Of An Integrated Wiki-G-Portal, Yin-Leng Theng, Yuanyuan Li, Ee Peng Lim, Zhe Wang, Dion Hoe-Lian Goh, Chew-Hung Chang, Kalyani Chatterjea, Jun Zhang

Research Collection School Of Computing and Information Systems

This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use.


A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim Nov 2006

A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Event detection is a very important area of research that discovers new events reported in a stream of text documents. Previous research in event detection has largely focused on finding the first story and tracking the events of a specific topic. A topic is simply a set of related events defined by user supplied keywords with no associated semantics and little domain knowledge. We therefore introduce the Anticipatory Event Detection (AED) problem: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic. We confine the events to …


Extracting Link Chains Of Relationship Instances From A Website, Myo-Myo Naing, Ee Peng Lim, Roger Hsiang-Li Chiang Oct 2006

Extracting Link Chains Of Relationship Instances From A Website, Myo-Myo Naing, Ee Peng Lim, Roger Hsiang-Li Chiang

Research Collection School Of Computing and Information Systems

Web pages from a Web site can often be associated with concepts in an ontology, and pairs of Web pages also can be associated with relationships between concepts. With such associations, the Web site can be searched, browsed, or even reorganized based on the concept and relationship labels of its Web pages. In this article, we study the link chain extraction problem that is critical to the extraction of Web pages that are related. A link chain is an ordered list of anchor elements linking two Web pages related by some semantic relationship. We propose a link chain extraction method …


Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis Sep 2006

Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis

Research Collection School Of Computing and Information Systems

Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as °uctuations of edge weights. The ¯rst one maintains the query results by processing only updates that may invalidate …


Masking Page Reference Patterns In Encryption Databases On Untrusted Storage, Xi Ma, Hwee Hwa Pang, Kian-Lee Tan Sep 2006

Masking Page Reference Patterns In Encryption Databases On Untrusted Storage, Xi Ma, Hwee Hwa Pang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

To support ubiquitous computing, the underlying data have to be persistent and available anywhere-anytime. The data thus have to migrate from devices that are local to individual computers, to shared storage volumes that are accessible over open network. This potentially exposes the data to heightened security risks. In particular, the activity on a database exhibits regular page reference patterns that could help attackers learn logical links among physical pages and then launch additional attacks. We propose two countermeasures to mitigate the risk of attacks initiated through analyzing the shared storage server’s activity for those page patterns. The first countermeasure relocates …


An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun Aug 2006

An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun

Research Collection School Of Computing and Information Systems

Data broadcast is an attractive data dissemination method in mobile environments. To improve energy efficiency, existing air indexing schemes for data broadcast have focused on reducing tuning time only, i.e., the duration that a mobile client stays active in data accesses. On the other hand, existing broadcast scheduling schemes have aimed at reducing access latency through nonflat data broadcast to improve responsiveness only. Not much work has addressed the energy efficiency and responsiveness issues concurrently. This paper proposes an energy-efficient indexing scheme called MHash that optimizes tuning time and access latency in an integrated fashion. MHash reduces tuning time by …


Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang Aug 2006

Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang

Research Collection School Of Computing and Information Systems

In this paper, we investigate how deviation in evaluation activities may reveal bias on the part of reviewers and controversy on the part of evaluated objects. We focus on a 'data-centric approach' where the evaluation data is assumed to represent the ground truth'. The standard statistical approaches take evaluation and deviation at face value. We argue that attention should be paid to the subjectivity of evaluation, judging the evaluation score not just on 'what is being said' (deviation), but also on 'who says it' (reviewer) as well as on 'whom it is said about' (object). Furthermore, we observe that bias …


Authenticating Multi-Dimensional Query Results In Data Publishing, Weiwei Cheng, Hwee Hwa Pang, Kian-Lee Tan Jul 2006

Authenticating Multi-Dimensional Query Results In Data Publishing, Weiwei Cheng, Hwee Hwa Pang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the publisher may be untrusted or susceptible to attacks, it could produce incorrect query results. This paper introduces a mechanism for users to verify that their query answers on a multi-dimensional dataset are correct, in the sense of being complete (i.e., no qualifying data points are omitted) and authentic (i.e., all the result values originated from the owner). Our approach is to add authentication information into a spatial data structure, by constructing certified chains on the points within each partition, as well as …


Extraction Of Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai Jul 2006

Extraction Of Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai

Research Collection School Of Computing and Information Systems

In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage …


Interacting With Web Hierarchies, Saverio Perugini, Naren Ramakrishnan Jun 2006

Interacting With Web Hierarchies, Saverio Perugini, Naren Ramakrishnan

Computer Science Faculty Publications

Web site interfaces are a particularly good fit for hierarchies in the broadest sense of that idea, i.e. a classification with multiple attributes, not necessarily a tree structure. Several adaptive interface designs are emerging that support flexible navigation orders, exposing and exploring dependencies, and procedural information-seeking tasks. This paper provides a context and vocabulary for thinking about hierarchical Web sites and their design. The paper identifies three features that interface to information hierarchies. These are flexible navigation orders, the ability to expose and explore dependencies, and support for procedural tasks. A few examples of these features are also provided


Exploiting Domain Structure For Named Entity Recognition, Jing Jiang, Chengxiang Zhai Jun 2006

Exploiting Domain Structure For Named Entity Recognition, Jing Jiang, Chengxiang Zhai

Research Collection School Of Computing and Information Systems

Named Entity Recognition (NER) is a fundamental task in text mining and natural language understanding. Current approaches to NER (mostly based on supervised learning) perform well on domains similar to the training domain, but they tend to adapt poorly to slightly different domains. We present several strategies for exploiting the domain structure in the training data to learn a more robust named entity recognizer that can perform well on a new domain. First, we propose a simple yet effective way to automatically rank features based on their generalizabilities across domains. We then train a classifier with strong emphasis on the …


On In-Network Synopsis Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang May 2006

On In-Network Synopsis Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang

Research Collection School Of Computing and Information Systems

The emergence of sensor networks enables applications that deploy sensors to collaboratively monitor environment and process data collected. In some scenarios, we are interested in using join queries to correlate data stored in different regions of a sensor network, where the data volume is large, making it prohibitive to transmit all data to a central server for joining. In this paper, we present an in-network synopsis join strategy for evaluating join queries in sensor networks with communication efficiency. In this strategy, we prune data that do not contribute to the join results in the early stage of the join processing, …


Discovering Causal Dependencies In Mobile Context-Aware Recommenders, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang May 2006

Discovering Causal Dependencies In Mobile Context-Aware Recommenders, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

Mobile context-aware recommender systems face unique challenges in acquiring context. Resource limitations make minimizing context acquisition a practical need, while the uncertainty inherent to the mobile environment makes missing context values a major concern. This paper introduces a scalable mechanism based on Bayesian network learning in a tiered context model to overcome both of these challenges. Extensive experiments on a restaurant recommender system showed that our mechanism can accurately discover causal dependencies among context, thereby enabling the effective identification of the minimal set of important context for a specific user and task, as well as providing highly accurate recommendations even …


Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan Apr 2006

Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan

Research Collection School Of Computing and Information Systems

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning …


Searching Substructures With Superimposed Distance, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu Apr 2006

Searching Substructures With Superimposed Distance, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu

Research Collection School Of Computing and Information Systems

Efficient indexing techniques have been developed for the exact and approximate substructure search in large scale graph databases. Unfortunately, the retrieval problem of structures with categorical or geometric distance constraints is not solved yet. In this paper, we develop a method called PIS (Partition-based Graph Index and Search) to support similarity search on substructures with superimposed distance constraints. PIS selects discriminative fragments in a query graph and uses an index to prune the graphs that violate the distance constraints. We identify a criterion to distinguish the selectivity of fragments in multiple graphs and develop a partition method to obtain a …


Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim Apr 2006

Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Mobile user data mining is a field that focuses on extracting interesting pattern and knowledge out from data generated by mobile users. Group pattern is a type of mobile user data mining method. In group pattern mining, group patterns from a given user movement database is found based on spatio-temporal distances. In this paper, we propose an improvement of efficiency using area method for locating mobile users and using sliding window for static group pattern mining. This reduces the complexity of valid group pattern mining problem. We support the use of static method, which uses areas and sliding windows instead …


In-Network Processing Of Nearest Neigbor Queries For Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim Apr 2006

In-Network Processing Of Nearest Neigbor Queries For Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wireless sensor networks have been widely used for civilian and military applications, such as environmental monitoring and vehicle tracking. The sensor nodes in the network have the abilities to sense, store, compute and communicate. To enable object tracking applications, spatial queries such as nearest neighbor queries are to be supported in these networks. The queries can be injected by the user at any sensor node. Due to the limited power supply for sensor nodes, energy efficiency is the major concern in query processing. Centralized data storage and query processing schemes do not favor energy efficiency. In this paper, we propose …


Realtime Query Expansion And Procedural Interfaces For Information Hierarchies, Saverio Perugini Jan 2006

Realtime Query Expansion And Procedural Interfaces For Information Hierarchies, Saverio Perugini

Computer Science Faculty Publications

We demonstrate the use of two user interfaces for interacting with web hierarchies. One uses the dependencies underlying a hierarchy to perform real-time query expansion and, in this way, acts as an in situ feedback mechanism. The other enables the user to cascade the output from one interaction to the input of another, and so on, and, in this way, supports procedural information-seeking tasks without disrupting the flow of interaction.


Information Assurance Through Binary Vulnerability Auditing, William B. Kimball, Saverio Perugini Jan 2006

Information Assurance Through Binary Vulnerability Auditing, William B. Kimball, Saverio Perugini

Computer Science Faculty Publications

The goal of this research is to develop improved methods of discovering vulnerabilities in software. A large volume of software, from the most frequently used programs on a desktop computer, such as web browsers, e-mail programs, and word processing applications, to mission-critical services for the space shuttle, is unintentionally vulnerable to attacks and thus insecure. By seeking to improve the identification of vulnerabilities in software, the security community can save the time and money necessary to restore compromised computer systems. In addition, this research is imperative to activities of national security such as counterterrorism. The current approach involves a systematic …


Efficient Mining Of Group Patterns From User Movement Data, Yida Wang, Ee Peng Lim, San-Yih Hwang Jan 2006

Efficient Mining Of Group Patterns From User Movement Data, Yida Wang, Ee Peng Lim, San-Yih Hwang

Research Collection School Of Computing and Information Systems

In this paper, we present a new approach to derive groupings of mobile users based on their movement data. We assume that the user movement data are collected by logging location data emitted from mobile devices tracking users. We formally define group pattern as a group of users that are within a distance threshold from one another for at least a minimum duration. To mine group patterns, we first propose two algorithms, namely AGP and VG-growth. In our first set of experiments, it is shown when both the number of users and logging duration are large, AGP and VG-growth are …


Grid-Partition Index: A Hybrid Approach To Nearest-Neighbor Queries In Wireless Location-Based Services, Baihua Zheng, Jianliang Xu, Wang-Chien Lee, Dik Lun Lee Jan 2006

Grid-Partition Index: A Hybrid Approach To Nearest-Neighbor Queries In Wireless Location-Based Services, Baihua Zheng, Jianliang Xu, Wang-Chien Lee, Dik Lun Lee

Research Collection School Of Computing and Information Systems

Traditional nearest-neighbor (NN) search is based on two basic indexing approaches: object-based indexing and solution-based indexing. The former is constructed based on the locations of data objects: using some distance heuristics on object locations. The latter is built on a precomputed solution space. Thus, NN queries can be reduced to and processed as simple point queries in this solution space. Both approaches exhibit some disadvantages, especially when employed for wireless data broadcast in mobile computing environments. In this paper, we introduce a new index method, called the grid-partition index, to support NN search in both ondemand access and periodic broadcast …


In-Network Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang Jan 2006

In-Network Join Processing For Sensor Networks, Hai Yu, Ee Peng Lim, Jun Zhang

Research Collection School Of Computing and Information Systems

Recent advances in hardware and wireless technologies have led to sensor networks consisting of large number of sensors capable of gathering and processing data collectively. Query processing on these sensor networks has to consider various inherent constraints. While simple queries such as select and aggregate queries in wireless sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. In this paper, we present a synopsis join strategy for evaluating join queries in sensor networks with communication efficiency. In this strategy, instead of directly joining two relations distributed in a sensor …