Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 36

Full-Text Articles in Databases and Information Systems

On Strategies For Imbalanced Text Classification Using Svm: A Comparative Study, Aixin Sun, Ee Peng Lim, Ying Liu Dec 2009

On Strategies For Imbalanced Text Classification Using Svm: A Comparative Study, Aixin Sun, Ee Peng Lim, Ying Liu

Research Collection School Of Computing and Information Systems

Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, …


To Trust Or Not To Trust? Predicting Online Trusts Using Trust Antecedent Framework, Viet-An Nguyen, Ee Peng Lim, Jing Jiang, Aixin Sun Dec 2009

To Trust Or Not To Trust? Predicting Online Trusts Using Trust Antecedent Framework, Viet-An Nguyen, Ee Peng Lim, Jing Jiang, Aixin Sun

Research Collection School Of Computing and Information Systems

This paper analyzes the trustor and trustee factors that lead to inter-personal trust using a well studied Trust Antecedent framework in management science. To apply these factors to trust ranking problem in online rating systems, we derive features that correspond to each factor and develop different trust ranking models. The advantage of this approach is that features relevant to trust can be systematically derived so as to achieve good prediction accuracy. Through a series of experiments on real data from Epinions, we show that even a simple model using the derived features yields good accuracy and outperforms MoleTrust, a trust …


What Makes Categories Difficult To Classify?, Aixin Sun, Ee Peng Lim, Ying Liu Nov 2009

What Makes Categories Difficult To Classify?, Aixin Sun, Ee Peng Lim, Ying Liu

Research Collection School Of Computing and Information Systems

In this paper, we try to predict which category will be less accurately classified compared with other categories in a classification task that involves multiple categories. The categories with poor predicted performance will be identified before any classifiers are trained and additional steps can be taken to address the predicted poor accuracies of these categories. Inspired by the work on query performance prediction in ad-hoc retrieval, we propose to predict classification performance using two measures, namely, category size and category coherence. Our experiments on 20-Newsgroup and Reuters-21578 datasets show that the Spearman rank correlation coefficient between the predicted rank of …


Trust Relationship Prediction Using Online Product Review Data, Nan Ma, Ee Peng Lim, Viet-An Nguyen, Aixin Sun Nov 2009

Trust Relationship Prediction Using Online Product Review Data, Nan Ma, Ee Peng Lim, Viet-An Nguyen, Aixin Sun

Research Collection School Of Computing and Information Systems

Trust between users is an important piece of knowledge that can be exploited in search and recommendation.Given that user-supplied trust relationships are usually very sparse, we study the prediction of trust relationships using user interaction features in an online user generated review application context. We show that trust relationship prediction can achieve better accuracy when one adopts personalized and cluster-based classification methods. The former trains one classifier for each user using user-specific training data. The cluster-based method first constructs user clusters before training one classifier for each user cluster. Our proposed methods have been evaluated in a series of experiments …


Trust-Oriented Composite Services Selection And Discovery, Lei Li, Yan Wang, Ee Peng Lim Nov 2009

Trust-Oriented Composite Services Selection And Discovery, Lei Li, Yan Wang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In Service-Oriented Computing (SOC) environments, service clients interact with service providers for consuming services. From the viewpoint of service clients, the trust level of a service or a service provider is a critical issue to consider in service selection and discovery, particularly when a client is looking for a service from a large set of services or service providers. However, a service may invoke other services offered by different providers forming composite services. The complex invocations in composite services greatly increase the complexity of trust-oriented service selection and discovery. In this paper, we propose novel approaches for composite service representation, …


Udel/Smu At Trec 2009 Entity Track, Wei Zheng, Swapna Gottipati, Jing Jiang, Hui Fang Nov 2009

Udel/Smu At Trec 2009 Entity Track, Wei Zheng, Swapna Gottipati, Jing Jiang, Hui Fang

Research Collection School Of Computing and Information Systems

We report our methods and experiment results from the collaborative participation of the InfoLab group from University of Delaware and the school of Information Systems from Singapore Management University in the TREC 2009 Entity track. Our general goal is to study how we may apply language modeling approaches and natural language processing techniques to the task. Specically, we proposed to find supporting information based on segment retrieval, to extract entities using Stanford NER tagger, and to rank entities based on a previously proposed probabilistic framework for expert finding.


Continuous Monitoring Of Spatial Queries In Wireless Broadcast Environments, Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias Oct 2009

Continuous Monitoring Of Spatial Queries In Wireless Broadcast Environments, Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias

Research Collection School Of Computing and Information Systems

Wireless data broadcast is a promising technique for information dissemination that leverages the computational capabilities of the mobile devices in order to enhance the scalability of the system. Under this environment, the data are continuously broadcast by the server, interleaved with some indexing information for query processing. Clients may then tune in the broadcast channel and process their queries locally without contacting the server. Previous work on spatial query processing for wireless broadcast systems has only considered snapshot queries over static data. In this paper, we propose an air indexing framework that 1) outperforms the existing (i.e., snapshot) techniques in …


Parallel Sets In The Real World: Three Case Studies, Robert Kosara, Caroline Ziemkiewicz, F. Joseph Iii Mako, Tin Seong Kam Oct 2009

Parallel Sets In The Real World: Three Case Studies, Robert Kosara, Caroline Ziemkiewicz, F. Joseph Iii Mako, Tin Seong Kam

Research Collection School Of Computing and Information Systems

Parallel Sets are a visualization technique for categorical data. We recently released an implementation to the public in an effort to make our research useful to real users. This paper presents three case studies of Parallel Sets in use with real data.


Visible Reverse K-Nearest Neighbor Query Processing In Spatial Databases, Yunjun Gao, Baihua Zheng, Gencai Chen, Wang-Chien Lee, Ken C. K. Lee, Qing Li Sep 2009

Visible Reverse K-Nearest Neighbor Query Processing In Spatial Databases, Yunjun Gao, Baihua Zheng, Gencai Chen, Wang-Chien Lee, Ken C. K. Lee, Qing Li

Research Collection School Of Computing and Information Systems

Reverse nearest neighbor (RNN) queries have a broad application base such as decision support, profile-based marketing, resource allocation, etc. Previous work on RNN search does not take obstacles into consideration. In the real world, however, there are many physical obstacles (e.g., buildings) and their presence may affect the visibility between objects. In this paper, we introduce a novel variant of RNN queries, namely, visible reverse nearest neighbor (VRNN) search, which considers the impact of obstacles on the visibility of objects. Given a data set P, an obstacle set O, and a query point q in a 2D space, a VRNN …


Multi-Task Transfer Learning For Weakly-Supervised Relation Extraction, Jing Jiang Aug 2009

Multi-Task Transfer Learning For Weakly-Supervised Relation Extraction, Jing Jiang

Research Collection School Of Computing and Information Systems

Creating labeled training data for relation extraction is expensive. In this paper, we study relation extraction in a special weakly-supervised setting when we have only a few seed instances of the target relation type we want to extract but we also have a large amount of labeled instances of other relation types. Observing that different relation types can share certain common structures, we propose to use a multi-task learning method coupled with human guidance to address this weakly-supervised relation extraction problem. The proposed framework models the commonality among different relation types through a shared weight vector, enables knowledge learned from …


Optimal-Location-Selection Query Processing In Spatial Databases, Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li Aug 2009

Optimal-Location-Selection Query Processing In Spatial Databases, Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li

Research Collection School Of Computing and Information Systems

This paper introduces and solves a novel type of spatial queries, namely, Optimal-Location-Selection (OLS) search, which has many applications in real life. Given a data object set D_A, a target object set D_B, a spatial region R, and a critical distance d_c in a multidimensional space, an OLS query retrieves those target objects in D_B that are outside R but have maximal optimality. Here, the optimality of a target object b \in D_B located outside R is defined as the number of the data objects from D_A that are inside R and meanwhile have their distances to b not exceeding …


Ssnetviz: A Visualization Engine For Heterogeneous Semantic Social Networks, Ee Peng Lim, Maureen Maureen, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang Aug 2009

Ssnetviz: A Visualization Engine For Heterogeneous Semantic Social Networks, Ee Peng Lim, Maureen Maureen, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang

Research Collection School Of Computing and Information Systems

SSnetViz is an ongoing research to design and implement a visualization engine for heterogeneous semantic social networks. A semantic social network is a multi-modal network that contains nodes representing di®erent types of people or object entities, and edges representing relationships among them. When multiple heterogeneous semantic social networks are to be visualized together, SSnetViz provides a suite of functions to store heterogeneous semantic social networks, to integrate them for searching and analysis. We will illustrate these functions using social networks related to terrorism research, one crafted by domain experts and another from Wikipedia.


Inferring Player Rating From Performance Data In Massively Multiplayer Online Role-Playing Games (Mmorpgs), Kyong Jin Shim, Muhammad Aurangzeb Ahmad, Nishith Pathak, Jaideep Srivastava Aug 2009

Inferring Player Rating From Performance Data In Massively Multiplayer Online Role-Playing Games (Mmorpgs), Kyong Jin Shim, Muhammad Aurangzeb Ahmad, Nishith Pathak, Jaideep Srivastava

Research Collection School Of Computing and Information Systems

This paper examines online player performance in EverQuest II, a popular massively multiplayer online role-playing game (MMORPG) developed by Sony Online Entertainment. The study uses the game's player performance data to devise performance metrics for online players. We report three major findings. First, we show that the game's point-scaling system overestimates performances of lower level players and underestimates performances of higher level players. We present a novel point-scaling system based on the game's player performance data that addresses the underestimation and overestimation problems. Second, we present a highly accurate predictive model for player performance as a function of past behavior. …


A Distributed Spatial Index For Error-Prone Wireless Data Broadcast, Baihua Zheng, Wang-Chien Lee, Ken C. K. Lee, Dik Lun Lee, Min Shao Aug 2009

A Distributed Spatial Index For Error-Prone Wireless Data Broadcast, Baihua Zheng, Wang-Chien Lee, Ken C. K. Lee, Dik Lun Lee, Min Shao

Research Collection School Of Computing and Information Systems

Information is valuable to users when it is available not only at the right time but also at the right place. To support efficient location-based data access in wireless data broadcast systems, a distributed spatial index (called DSI) is presented in this paper. DSI is highly efficient because it has a linear yet fully distributed structure that naturally shares links in different search paths. DSI is very resilient to the error-prone wireless communication environment because interrupted search operations based on DSI can be resumed easily. It supports search algorithms for classical location-based queries such as window queries and kNN queries …


On Efficient Mutual Nearest Neighbor Query Processing In Spatial Databases, Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li Aug 2009

On Efficient Mutual Nearest Neighbor Query Processing In Spatial Databases, Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li

Research Collection School Of Computing and Information Systems

This paper studies a new form of nearest neighbor queries in spatial databases, namely, mutual nearest neighbour (MNN) search. Given a set D of objects and a query object q, an MNN query returns from D, the set of objects that are among the k1 (≥ 1) nearest neighbors (NNs) of q; meanwhile, have q as one of their k2(≥ 1) NNs. Although MNN queries are useful in many applications involving decision making, data mining, and pattern recognition, it cannot be efficiently handled by existing spatial query processing approaches. In this paper, we present …


Scalable Verification For Outsourced Dynamic Databases, Hwee Hwa Pang, Jilian Zhang, Kyriakos Mouratidis Aug 2009

Scalable Verification For Outsourced Dynamic Databases, Hwee Hwa Pang, Jilian Zhang, Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

Query answers from servers operated by third parties need to be verified, as the third parties may not be trusted or their servers may be compromised. Most of the existing authentication methods construct validity proofs based on the Merkle hash tree (MHT). The MHT, however, imposes severe concurrency constraints that slow down data updates. We introduce a protocol, built upon signature aggregation, for checking the authenticity, completeness and freshness of query answers. The protocol offers the important property of allowing new data to be disseminated immediately, while ensuring that outdated values beyond a pre-set age can be detected. We also …


Compositemap: A Novel Framework For Music Similarity Measure, Bingjun Zhang, Jialie Shen, Qiaoliang Xiang, Ye Wang Jul 2009

Compositemap: A Novel Framework For Music Similarity Measure, Bingjun Zhang, Jialie Shen, Qiaoliang Xiang, Ye Wang

Research Collection School Of Computing and Information Systems

With the continuing advances in data storage and communication technology, there has been an explosive growth of music information from different application domains. As an effective technique for organizing, browsing, and searching large data collections, music information retrieval is attracting more and more attention. How to measure and model the similarity between different music items is one of the most fundamental yet challenging research problems. In this paper, we introduce a novel framework based on a multimodal and adaptive similarity measure for various applications. Distinguished from previous approaches, our system can effectively combine music properties from different aspects into a …


Continuous Obstructed Nearest Neighbor Queries In Spatial Databases, Yunjun Gao, Baihua Zheng Jul 2009

Continuous Obstructed Nearest Neighbor Queries In Spatial Databases, Yunjun Gao, Baihua Zheng

Research Collection School Of Computing and Information Systems

In this paper, we study a novel form of continuous nearest neighbor queries in the presence of obstacles, namely continuous obstructed nearest neighbor (CONN) search. It considers the impact of obstacles on the distance between objects, which is ignored by most of spatial queries. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CONN query retrieves the nearest neighbor of each point on q according to the obstructed distance, i.e., the shortest path between them without crossing any obstacle. We formulate CONN search, analyze its unique properties, and develop …


Spatial Cloaking Revisited: Distinguishing Information Leakage From Anonymity, Kar Way Tan, Yimin Lin, Kyriakos Mouratidis Jul 2009

Spatial Cloaking Revisited: Distinguishing Information Leakage From Anonymity, Kar Way Tan, Yimin Lin, Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

Location-based services (LBS) are receiving increasing popularity as they provide convenience to mobile users with on-demand information. The use of these services, however, poses privacy issues as the user locations and queries are exposed to untrusted LBSs. Spatial cloaking techniques provide privacy in the form of k-anonymity; i.e., they guarantee that the (location of the) querying user u is indistinguishable from at least k-1 others, where k is a parameter specified by u at query time. To achieve this, they form a group of k users, including u, and forward their minimum bounding rectangle (termed anonymzing spatial region, ASR) to …


Predicting Outcome For Collaborative Featured Article Nomination In Wikipedia, Meiqun Hu, Ee Peng Lim, Ramayya Krishnan May 2009

Predicting Outcome For Collaborative Featured Article Nomination In Wikipedia, Meiqun Hu, Ee Peng Lim, Ramayya Krishnan

Research Collection School Of Computing and Information Systems

In Wikipedia, good articles are wanted. While Wikipedia relies on collaborative effort from online volunteers for quality checking, the process of selecting top quality articles is time consuming. At present, the duty of decision making is shouldered by only a couple of administrators. Aiming to assist in the quality checking cycles so as to cope with the exponential growth of online contributions to Wikipedia, this work studies the task of predicting the outcome of featured article (FA) nominations. We analyze FA candidate (FAC) sessions collected over a period of 3.5 years, and examine the extent to which consensus has been …


On Mining Rating Dependencies In Online Collaborative Rating Networks, Hady W. Lauw, Ee Peng Lim, Ke Wang May 2009

On Mining Rating Dependencies In Online Collaborative Rating Networks, Hady W. Lauw, Ee Peng Lim, Ke Wang

Research Collection School Of Computing and Information Systems

The trend of social information processing sees e-commerce and social web applications increasingly relying on user-generated content, such as rating, to determine the quality of objects and to generate recommendations for users. In a rating system, a set of reviewers assign to a set of objects different types of scores based on specific evaluation criteria. In this paper, we seek to determine, for each reviewer and for each object, the dependency between scores on any two given criteria. A reviewer is said to have high dependency between a pair of criteria when his or her rating scores on objects based …


A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan May 2009

A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for …


Opaque: Protecting Path Privacy In Directions Search, Ken C. K. Lee, Wang-Chien Lee, Hong Va Leong, Baihua Zheng Apr 2009

Opaque: Protecting Path Privacy In Directions Search, Ken C. K. Lee, Wang-Chien Lee, Hong Va Leong, Baihua Zheng

Research Collection School Of Computing and Information Systems

Directions search returns the shortest path from a source to a destination on a road network. However, the search interests of users may be exposed to the service providers, thus raising privacy concerns. For instance, a path query that finds a path from a resident address to a clinic may lead to a deduction about "who is related to what disease". To protect user privacy from accessing directions search services, we introduce the OPAQUE system, which consists of two major components: (1) an obfuscator that formulates obfuscated path queries by mixing true and fake sources/destinations; and (2) an obfuscated path …


An Incremental Threshold Method For Continuous Text Search Queries, Kyriakos Mouratidis, Hwee Hwa Pang Apr 2009

An Incremental Threshold Method For Continuous Text Search Queries, Kyriakos Mouratidis, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

A text filtering system monitors a stream of incoming documents, to identify those that match the interest profiles of its users. The user interests are registered at a server as continuous text search queries. The server constantly maintains for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is …


Continuous Visible Nearest Neighbour Queries, Yunjun Gao, Baihua Zheng, Wang-Chien Lee, Gencai Chen Mar 2009

Continuous Visible Nearest Neighbour Queries, Yunjun Gao, Baihua Zheng, Wang-Chien Lee, Gencai Chen

Research Collection School Of Computing and Information Systems

In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q, a CVNN query returns a set of (p, R) tuples such that p ? P is the nearest neighbor (NN) to every point r along the interval R ? q as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R, due to the obstruction of some obstacles in …


Fast Object Search On Road Networks, Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng Mar 2009

Fast Object Search On Road Networks, Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng

Research Collection School Of Computing and Information Systems

In this paper, we present ROAD, a general framework to evaluate Location-Dependent Spatial Queries (LDSQ)s that searches for spatial objects on road networks. By exploiting search space pruning technique and providing a dynamic object mapping mechanism, ROAD is very efficient and flexible for various types of queries, namely, range search and nearest neighbor search, on objects over large-scale networks. ROAD is named after its two components, namely, Route Overlay and Association Directory, designed to address the network traversal and object access aspects of the framework. In ROAD, a large road network is organized as a hierarchy of interconnected regional sub-networks …


Stochastic Modeling Western Paintings For Effective Classification, Jialie Shen Feb 2009

Stochastic Modeling Western Paintings For Effective Classification, Jialie Shen

Research Collection School Of Computing and Information Systems

As one of the most important cultural heritages, classical western paintings have always played a special role in human live and been applied for many different purposes. While image classification is the subject of a plethora of related publications, relatively little attention has been paid to automatic categorization of western classical paintings which could be a key technique of modern digital library, museums and art galleries. This paper studies automatic classification on large western painting image collection. We propose a novel framework to support automatic classification on large western painting image collections. With this framework, multiple visual features can be …


Localized Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim Jan 2009

Localized Monitoring Of Knn Queries In Wireless Sensor Networks, Yuxia Yao, Xueyan Tang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wireless sensor networks have been widely used in civilian and military applications. Primarily designed for monitoring purposes, many sensor applications require continuous collection and processing of sensed data. Due to the limited power supply for sensor nodes, energy efficiency is a major performance concern in query processing. In this paper, we focus on continuous kNN query processing in object tracking sensor networks. We propose a localized scheme to monitor nearest neighbors to a query point. The key idea is to establish a monitoring area for each query so that only the updates relevant to the query are collected. The monitoring …


Partially Materialized Digest Scheme: An Efficient Verification Method For Outsourced Databases, Kyriakos Mouratidis, Dimitris Sacharidis, Hwee Hwa Pang Jan 2009

Partially Materialized Digest Scheme: An Efficient Verification Method For Outsourced Databases, Kyriakos Mouratidis, Dimitris Sacharidis, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

In the outsourced database model, a data owner publishes her database through a third-party server; i.e., the server hosts the data and answers user queries on behalf of the owner. Since the server may not be trusted, or may be compromised, users need a means to verify that answers received are both authentic and complete, i.e., that the returned data have not been tampered with, and that no qualifying results have been omitted. We propose a result verification approach for one-dimensional queries, called Partially Materialized Digest scheme (PMD), that applies to both static and dynamic databases. PMD uses separate indexes …


Quc-Tree: Integrating Query Context Information For Efficient Music Retrieval, Jialie Shen, Dacheng Tao, Xuelong Li Jan 2009

Quc-Tree: Integrating Query Context Information For Efficient Music Retrieval, Jialie Shen, Dacheng Tao, Xuelong Li

Research Collection School Of Computing and Information Systems

In this paper, we introduce a novel indexing scheme-query context tree (QUC-tree) to facilitate efficient query sensitive music search under different query contexts. Distinguished from the previous approaches, QUC-tree is a balanced multiway tree structure, where each level represents the data space at different dimensionality. Before the tree structure construction, principle component analysis (PCA) is applied for data analysis and transforming the raw composite features into a new feature space sorted by the importance of acoustic features. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an …