Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Research Collection School Of Computing and Information Systems

2010

Articles 31 - 60 of 106

Full-Text Articles in Databases and Information Systems

Semi-Supervised Distance Metric Learning For Collaborative Image Retrieval And Clustering, Steven C. H. Hoi, Wei Liu, Shih-Fu Chang Aug 2010

Semi-Supervised Distance Metric Learning For Collaborative Image Retrieval And Clustering, Steven C. H. Hoi, Wei Liu, Shih-Fu Chang

Research Collection School Of Computing and Information Systems

Learning a good distance metric plays a vital role in many multimedia retrieval and data mining tasks. For example, a typical content-based image retrieval (CBIR) system often relies on an effective distance metric to measure similarity between any two images. Conventional CBIR systems simply adopting Euclidean distance metric often fail to return satisfactory results mainly due to the well-known semantic gap challenge. In this article, we present a novel framework of Semi-Supervised Distance Metric Learning for learning effective distance metrics by exploring the historical relevance feedback log data of a CBIR system and utilizing unlabeled data when log data are …


Team Performance Prediction In Massively Multiplayer Online Role-Playing Games (Mmorpgs), Kyong Jin Shim, Jaideep Srivastava Aug 2010

Team Performance Prediction In Massively Multiplayer Online Role-Playing Games (Mmorpgs), Kyong Jin Shim, Jaideep Srivastava

Research Collection School Of Computing and Information Systems

In this study, we propose a comprehensive performance management tool for measuring and reporting operational activities of teams. This study uses performance data of game players and teams in EverQuest II, a popular MMORPG developed by Sony Online Entertainment, to build performance prediction models for task performing teams. The prediction models provide a projection of task performing team's future performance based on the past performance patterns of participating players on the team as well as team characteristics. While the existing game system lacks the ability to predict team-level performance, the prediction models proposed in this study are expected to be …


A Probabilistic Approach To Personalized Tag Recommendation, Meiqun Hu, Ee Peng Lim, Jing Jiang Aug 2010

A Probabilistic Approach To Personalized Tag Recommendation, Meiqun Hu, Ee Peng Lim, Jing Jiang

Research Collection School Of Computing and Information Systems

In this work, we study the task of personalized tag recommendation in social tagging systems. To reach out to tags beyond the existing vocabularies of the query resource and of the query user, we examine recommendation methods that are based on personomy translation, and propose a probabilistic framework for incorporating translations by similar users (neighbors). We propose to use distributional divergence to measure the similarity between users in the context of personomy translation, and examine two variations of such similarity measures. We evaluate the proposed framework on a benchmark dataset collected from BibSonomy, and compare with personomy translation methods based …


Mining Interaction Behaviors For Email Reply Order Prediction, Byung-Won On, Ee Peng Lim, Jing Jiang, Amruta Purandare, Loo Nin Teow Aug 2010

Mining Interaction Behaviors For Email Reply Order Prediction, Byung-Won On, Ee Peng Lim, Jing Jiang, Amruta Purandare, Loo Nin Teow

Research Collection School Of Computing and Information Systems

In email networks, user behaviors affect the way emails are sent and replied. While knowing these user behaviors can help to create more intelligent email services, there has not been much research into mining these behaviors. In this paper, we investigate user engagingness and responsiveness as two interaction behaviors that give us useful insights into how users email one another. Engaging users are those who can effectively solicit responses from other users. Responsive users are those who are willing to respond to other users. By modeling such behaviors, we are able to mine them and to identify engaging or responsive …


A Heuristic Algorithm For Trust-Oriented Service Provider Selection In Complex Social Networks, Guanfeng Liu, Yan Wang, Mehmet A. Orgun, Ee Peng Lim Jul 2010

A Heuristic Algorithm For Trust-Oriented Service Provider Selection In Complex Social Networks, Guanfeng Liu, Yan Wang, Mehmet A. Orgun, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In a service-oriented online social network consisting of service providers and consumers, a service consumer can search trustworthy service providers via the social network. This requires the evaluation of the trustworthiness of a service provider along a certain social trust path from the service consumer to the service provider. However, there are usually many social trust paths between participants in social networks. Thus, a challenging problem is which social trust path is the optimal one that can yield the most trustworthy evaluation result. In this paper, we first present a novel complex social network structure and a new concept, Quality …


Evaluation Of Protein Backbone Alphabets: Using Predicted Local Structure For Fold Recognition, Kyong Jin Shim Jul 2010

Evaluation Of Protein Backbone Alphabets: Using Predicted Local Structure For Fold Recognition, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Optimally combining available information is one of the key challenges in knowledge-driven prediction techniques. In this study, we evaluate six Phi and Psi-based backbone alphabets. We show that the addition of predicted backbone conformations to SVM classifiers can improve fold recognition. Our experimental results show that the inclusion of predicted backbone conformations in our feature representation leads to higher overall accuracy compared to when using amino acid residues alone.


Learning To Rank Only Using Training Data From Related Domain, Wei Gao, Peng Cai, Kam-Fai Wong, Aoying Zhou Jul 2010

Learning To Rank Only Using Training Data From Related Domain, Wei Gao, Peng Cai, Kam-Fai Wong, Aoying Zhou

Research Collection School Of Computing and Information Systems

Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise …


Semantics-Preserving Bag-Of-Words Models And Applications, Lei Wu, Steven C. H. Hoi, Nenghai Yu Jul 2010

Semantics-Preserving Bag-Of-Words Models And Applications, Lei Wu, Steven C. H. Hoi, Nenghai Yu

Research Collection School Of Computing and Information Systems

The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a …


Mental Development And Representation Building Through Motivated Learning, Janusz Starzyk, Pawel Raif, Ah-Hwee Tan Jul 2010

Mental Development And Representation Building Through Motivated Learning, Janusz Starzyk, Pawel Raif, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Motivated learning is a new machine learning approach that extends reinforcement learning idea to dynamically changing, and highly structured environments. In this approach a machine is capable of defining its own objectives and learns to satisfy them though an internal reward system. The machine is forced to explore the environment in response to externally applied negative (pain) signals that it must minimize. In doing so, it discovers relationships between objects observed through its sensory inputs and actions it performs on the observed objects. Observed concepts are not predefined but are emerging as a result of successful operations. For the optimum …


Self-Organizing Agents For Reinforcement Learning In Virtual Worlds, Yilin Kang, Ah-Hwee Tan Jul 2010

Self-Organizing Agents For Reinforcement Learning In Virtual Worlds, Yilin Kang, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

We present a self-organizing neural model for creating intelligent learning agents in virtual worlds. As agents in a virtual world roam, interact and socialize with users and other agents as in real world without explicit goals and teachers, learning in virtual world presents many challenges not found in typical machine learning benchmarks. In this paper, we highlight the unique issues and challenges of building learning agents in virtual world using reinforcement learning. Specifically, a self-organizing neural model, named TD-FALCON (Temporal Difference - Fusion Architecture for Learning and Cognition), is deployed, which enables an autonomous agent to adapt and function in …


Towards Probabilistic Memetic Algorithm: An Initial Study On Capacitated Arc Routing Problem, Liang Feng, Yew-Soon Ong, Quang Huy Nguyen, Ah-Hwee Tan Jul 2010

Towards Probabilistic Memetic Algorithm: An Initial Study On Capacitated Arc Routing Problem, Liang Feng, Yew-Soon Ong, Quang Huy Nguyen, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Capacitated arc routing problem (CARP) has attracted much attention due to its generality to many real world problems. Memetic algorithm (MA), among other metaheuristic search methods, has been shown to achieve competitive performances in solving CARP ranging from small to medium size. In this paper we propose a formal probabilistic memetic algorithm for CARP that is equipped with an adaptation mechanism to control the degree of global exploration against local exploitation while the search progresses. Experimental study on benchmark instances of CARP showed that the proposed probabilistic scheme led to improved search performances when introduced into a recently proposed state-of-the-art …


Faceted Topic Retrieval Of News Video Using Joint Topic Modeling Of Visual Features And Speech Transcripts, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia Jul 2010

Faceted Topic Retrieval Of News Video Using Joint Topic Modeling Of Visual Features And Speech Transcripts, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia

Research Collection School Of Computing and Information Systems

Because of the inherent ambiguity in user queries, an important task of modern retrieval systems is faceted topic retrieval (FTR), which relates to the goal of returning diverse or novel information elucidating the wide range of topics or facets of the query need. We introduce a generative model for hypothesizing facets in the (news) video domain by combining the complementary information in the visual keyframes and the speech transcripts. We evaluate the efficacy of our multimodal model on the standard TRECVID-2005 video corpus annotated with facets. We find that: (1) the joint modeling of the visual and text (speech transcripts) …


Extracting Common Emotions From Blogs Based On Fine-Grained Sentiment Clustering, Shi Feng, Daling Wang, Ge Yu, Wei Gao, Kam-Fai Wong Jul 2010

Extracting Common Emotions From Blogs Based On Fine-Grained Sentiment Clustering, Shi Feng, Daling Wang, Ge Yu, Wei Gao, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

Recently, blogs have emerged as the major platform for people to express their feelings and sentiments in the age of Web 2.0. The common emotions, which reflect people’s collective and overall sentiments, are becoming the major concern for governments, business companies and individual users. Different from previous literatures on sentiment classification and summarization, the major issue of common emotion extraction is to find out people’s collective sentiments and their corresponding distributions on the Web. Most existing blog clustering methods take into account keywords, stories or timelines but neglect the embedded sentiments, which are considered very important features of blogs. In …


Show Me The Numbers: Visual Analytics For Insights, Tin Seong Kam Jul 2010

Show Me The Numbers: Visual Analytics For Insights, Tin Seong Kam

Research Collection School Of Computing and Information Systems

In this highly volatile and fast-paced financial market, traders and managers working in banking and financial organizations must struggle to cope with large and complex data from multi-sources, that move throughout the market at increasingly high speed. The cost of making poor business and investment decisions is very high. This places great demands on data analysts, who are responsible for providing process information, to support the activities of traders and managers. Static reports and traditional business intelligence tools simply cannot keep up with a market that is changing on a second-to-second basis. By the time the traders and bankers have …


Effective Music Tagging Through Advanced Statistical Modeling, Jialie Shen, Meng Wang, Shuicheng Yan, Hwee Hwa Pang, Xian-Sheng Hua Jul 2010

Effective Music Tagging Through Advanced Statistical Modeling, Jialie Shen, Meng Wang, Shuicheng Yan, Hwee Hwa Pang, Xian-Sheng Hua

Research Collection School Of Computing and Information Systems

Music information retrieval (MIR) holds great promise as a technology for managing large music archives. One of the key components of MIR that has been actively researched into is music tagging. While significant progress has been achieved, most of the existing systems still adopt a simple classification approach, and apply machine learning classifiers directly on low level acoustic features. Consequently, they suffer the shortcomings of (1) poor accuracy, (2) lack of comprehensive evaluation results and the associated analysis based on large scale datasets, and (3) incomplete content representation, arising from the lack of multimodal and temporal information integration. In this …


Generating Templates Of Entity Summaries With An Entity-Aspect Model And Pattern Mining, Peng Li, Jing Jiang, Yinglin Wang Jul 2010

Generating Templates Of Entity Summaries With An Entity-Aspect Model And Pattern Mining, Peng Li, Jing Jiang, Yinglin Wang

Research Collection School Of Computing and Information Systems

In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We …


Non-Parametric Kernel Ranking Approach For Social Image Retrieval, Jinfeng Zhuang, Steven C. H. Hoi Jul 2010

Non-Parametric Kernel Ranking Approach For Social Image Retrieval, Jinfeng Zhuang, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Social image retrieval has become an emerging research challenge in web rich media search. In this paper, we address the research problem of text-based social image retrieval, which aims to identify and return a set of relevant social images that are related to a text-based query from a corpus of social images. Regular approaches for social image retrieval simply adopt typical text-based image retrieval techniques to search for the relevant social images based on the associated tags, which may suffer from noisy tags. In this paper, we present a novel framework for social image re-ranking based on a non-parametric kernel …


A Self-Organizing Approach To Episodic Memory Modeling, Wenwen Wang, Budhitama Subagdja, Ah-Hwee Tan Jul 2010

A Self-Organizing Approach To Episodic Memory Modeling, Wenwen Wang, Budhitama Subagdja, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

This paper presents a neural model that learns episodic traces in response to a continual stream of sensory input and feedback received from the environment. The proposed model, based on fusion Adaptive Resonance Theory (fusion ART) network, extracts key events and encodes spatiotemporal relations between events by creating cognitive nodes dynamically. The model further incorporates a novel memory search procedure, which performs parallel search of stored episodic traces continuously. Comparing with prior systems, the proposed episodic memory model presents a robust approach to encoding key events and episodes and recalling them using partial and erroneous cues. We present experimental studies, …


Self-Organizing Neural Networks For Behavior Modeling In Games, Shu Feng, Ah-Hwee Tan Jul 2010

Self-Organizing Neural Networks For Behavior Modeling In Games, Shu Feng, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

This paper proposes self-organizing neural networks for modeling behavior of non-player characters (NPC) in first person shooting games. Specifically, two classes of self-organizing neural models, namely Self-Generating Neural Networks (SGNN) and Fusion Architecture for Learning and Cognition (FALCON) are used to learn non-player characters' behavior rules according to recorded patterns. Behavior learning abilities of these two models are investigated by learning specific sample Bots in the Unreal Tournament game in a supervised manner. Our empirical experiments demonstrate that both SGNN and FALCON are able to recognize important behavior patterns and learn the necessary knowledge to operate in the Unreal environment. …


Visualizing And Exploring Evolving Information Networks In Wikipedia, Ee Peng Lim, Agus Trisnajaya Kwee, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Maureen Maureen Jun 2010

Visualizing And Exploring Evolving Information Networks In Wikipedia, Ee Peng Lim, Agus Trisnajaya Kwee, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Maureen Maureen

Research Collection School Of Computing and Information Systems

Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community’s knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia’s information networks. SSNetViz+ supports time-based network browsing, content browsing and search. Using a terrorism information network as an …


Weakly-Supervised Hashing In Kernel Space, Yadong Mu, Jialie Shen, Shuicheng Yan Jun 2010

Weakly-Supervised Hashing In Kernel Space, Yadong Mu, Jialie Shen, Shuicheng Yan

Research Collection School Of Computing and Information Systems

The explosive growth of the vision data motivates the recent studies on efficient data indexing methods such as locality-sensitive hashing (LSH). Most existing approaches perform hashing in an unsupervised way. In this paper we move one step forward and propose a supervised hashing method, i.e., the LAbel-regularized Max-margin Partition (LAMP) algorithm. The proposed method generates hash functions in weakly-supervised setting, where a small portion of sample pairs are manually labeled to be “similar” or “dissimilar”. We formulate the task as a Constrained Convex-Concave Procedure (CCCP), which can be relaxed into a series of convex sub-problems solvable with efficient Quadratic-Program (QP). …


Do Wikipedians Follow Domain Experts? A Domain-Specific Study On Wikipedia Contribution, Yi Zhang, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Ee Peng Lim Jun 2010

Do Wikipedians Follow Domain Experts? A Domain-Specific Study On Wikipedia Contribution, Yi Zhang, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We …


Z-Sky: An Efficient Skyline Query Processing Framework Based On Z-Order, Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng, Huajing Li, Yuan Tian Jun 2010

Z-Sky: An Efficient Skyline Query Processing Framework Based On Z-Order, Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng, Huajing Li, Yuan Tian

Research Collection School Of Computing and Information Systems

Given a set of data points in a multidimensional space, a skyline query retrieves those data points that are not dominated by any other point in the same dataset. Observing that the properties of Z-order space filling curves (or Z-order curves) perfectly match with the dominance relationships among data points in a geometrical data space, we, in this paper, develop and present a novel and efficient processing framework to evaluate skyline queries and their variants, and to support skyline result updates based on Z-order curves. This framework consists of ZBtree, i.e., an index structure to organize a source dataset and …


Satrap: Data And Network Heterogeneity Aware P2p Data-Mining, Hock Kee Ang, Vivekanand Gopalkrishnan, Anwitaman Datta, Wee Keong Ng, Steven C. H. Hoi Jun 2010

Satrap: Data And Network Heterogeneity Aware P2p Data-Mining, Hock Kee Ang, Vivekanand Gopalkrishnan, Anwitaman Datta, Wee Keong Ng, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Distributed classification aims to build an accurate classifier by learning from distributed data while reducing computation and communication cost A P2P network where numerous users come together to share resources like data content, bandwidth, storage space and CPU resources is an excellent platform for distributed classification However, two important aspects of the learning environment have often been overlooked by other works, viz., 1) location of the peers which results in variable communication cost and 2) heterogeneity of the peers' data which can help reduce redundant communication In this paper, we examine the properties of network and data heterogeneity and propose …


Semantic Context Modeling With Maximal Margin Conditional Random Fields For Automatic Image Annotation, Yu Xiang, Xiangdong Zhou, Zuotao Liu, Tat-Seng Chua, Chong-Wah Ngo Jun 2010

Semantic Context Modeling With Maximal Margin Conditional Random Fields For Automatic Image Annotation, Yu Xiang, Xiangdong Zhou, Zuotao Liu, Tat-Seng Chua, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Context modeling for Vision Recognition and Automatic Image Annotation (AIA) has attracted increasing attentions in recent years. For various contextual information and resources, semantic context has been exploited in AIA and brings promising results. However, previous works either casted the problem into structural classification or adopted multi-layer modeling, which suffer from the problems of scalability or model efficiency. In this paper, we propose a novel discriminative Conditional Random Field (CRF) model for semantic context modeling in AIA, which is built over semantic concepts and treats an image as a whole observation without segmentation. Our model captures the interactions between semantic …


Prediction Of Protein Subcellular Localization: A Machine Learning Approach, Kyong Jin Shim Jun 2010

Prediction Of Protein Subcellular Localization: A Machine Learning Approach, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Subcellular localization is a key functional characteristic of proteins. Optimally combining available information is one of the key challenges in today's knowledge-based subcellular localization prediction approaches. This study explores machine learning approaches for the prediction of protein subcellular localization that use resources concerning Gene Ontology and secondary structures. Using the spectrum kernel for feature representation of amino acid sequences and secondary structures, we explore an SVM-based learning method that classifies six subcellular localization sites: endoplasmic reticulum, extracellular, Golgi, membrane, mitochondria, and nucleus.


Player Performance Prediction In Massively Multiplayer Online Role-Playing Games (Mmorpgs), Kyong Jin Shim, Richa Sharan, Jaideep Srivastava Jun 2010

Player Performance Prediction In Massively Multiplayer Online Role-Playing Games (Mmorpgs), Kyong Jin Shim, Richa Sharan, Jaideep Srivastava

Research Collection School Of Computing and Information Systems

In this study, we propose a comprehensive performance management tool for measuring and reporting operational activities of game players. This study uses performance data of game players in EverQuest II, a popular MMORPG developed by Sony Online Entertainment, to build performance prediction models forgame players. The prediction models provide a projection of player’s future performance based on his past performance, which is expected to be a useful addition to existing player performance monitoring tools. First, we show that variations of PECOTA [2] and MARCEL [3], two most popular baseball home run prediction methods, can be used for game player performance …


Efficient Mutual Nearest Neighbor Query Processing For Moving Object Trajectories, Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li, Chun Chen, Gang Chen Jun 2010

Efficient Mutual Nearest Neighbor Query Processing For Moving Object Trajectories, Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li, Chun Chen, Gang Chen

Research Collection School Of Computing and Information Systems

Given a set D of trajectories, a query object q, and a query time extent Γ, a mutual (i.e., symmetric) nearest neighbor (MNN) query over trajectories finds from D, the set of trajectories that are among the k1 nearest neighbors (NNs) of q within Γ, and meanwhile, have q as one of their k2 NNs. This type of queries is useful in many applications such as decision making, data mining, and pattern recognition, as it considers both the proximity of the trajectories to q and the proximity of q to the trajectories. In this paper, we first formalize MNN search …


Otl: A Framework Of Online Transfer Learning, Peilin Zhao, Steven C. H. Hoi Jun 2010

Otl: A Framework Of Online Transfer Learning, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

In this paper, we investigate a new machine learning framework called Online Transfer Learning (OTL) that aims to transfer knowledge from some source domain to an online learning task on a target domain. We do not assume the target data follows the same class or generative distribution as the source data, and our key motivation is to improve a supervised online learning task in a target domain by exploiting the knowledge that had been learned from large amount of training data in source domains. OTL is in general challenging since data in both domains not only can be different in …


Using Hadoop And Cassandra For Taxi Data Analytics: A Feasibility Study, Alvin Jun Yong Koh, Xuan Khoa Nguyen, C. Jason Woodard Jun 2010

Using Hadoop And Cassandra For Taxi Data Analytics: A Feasibility Study, Alvin Jun Yong Koh, Xuan Khoa Nguyen, C. Jason Woodard

Research Collection School Of Computing and Information Systems

This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS location traces over multiple virtual machines using the Apache Hadoop implementation of the MapReduce paradigm. We also explored the use of the Apache Cassandra distributed database system for online retrieval of vehicle trace data. While configuring the testbed infrastructure was straightforward, we encountered severe I/O bottlenecks in running the benchmarks due to the lack …