Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Databases and Information Systems

Active Crowdsourcing For Annotation, Shuji Hao, Chunyan Miao, Steven C. H. Hoi, Peilin Zhao Dec 2015

Active Crowdsourcing For Annotation, Shuji Hao, Chunyan Miao, Steven C. H. Hoi, Peilin Zhao

Research Collection School Of Computing and Information Systems

Crowdsourcing has shown great potential in obtaining large-scale and cheap labels for different tasks. However, obtaining reliable labels is challenging due to several reasons, such as noisy annotators, limited budget and so on. The state-of-the-art approaches, either suffer in some noisy scenarios, or rely on unlimited resources to acquire reliable labels. In this article, we adopt the learning with expert~(AKA worker in crowdsourcing) advice framework to robustly infer accurate labels by considering the reliability of each worker. However, in order to accurately predict the reliability of each worker, traditional learning with expert advice will consult with external oracles~(AKA domain experts) …


Using Digital Genomics To Create An Intelligent Enterprise, Mario Domingo Nov 2015

Using Digital Genomics To Create An Intelligent Enterprise, Mario Domingo

Asian Management Insights

Every business knows that it needs to leverage customer data, but few know the potential it has to transform business processes, decisions and performance.


Choosing Your Weapons: On Sentiment Analysis Tools For Software Engineering Research, Robbert Jongeling, Subhajit Datta, Alexander Serebrenik Oct 2015

Choosing Your Weapons: On Sentiment Analysis Tools For Software Engineering Research, Robbert Jongeling, Subhajit Datta, Alexander Serebrenik

Research Collection School Of Computing and Information Systems

Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact …


Scheduled Approximation For Personalized Pagerank With Utility-Based Hub Selection, Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, Jing Ying Oct 2015

Scheduled Approximation For Personalized Pagerank With Utility-Based Hub Selection, Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, Jing Ying

Research Collection School Of Computing and Information Systems

As Personalized PageRank has been widely leveraged for ranking on a graph, the efficient computation of Personalized PageRank Vector (PPV) becomes a prominent issue. In this paper, we propose FastPPV, an approximate PPV computation algorithm that is incremental and accuracy-aware. Our approach hinges on a novel paradigm of scheduled approximation: the computation is partitioned and scheduled for processing in an “organized” way, such that we can gradually improve our PPV estimation in an incremental manner and quantify the accuracy of our approximation at query time. Guided by this principle, we develop an efficient hub-based realization, where we adopt the metric …


The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar Oct 2015

The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar

Research Collection School Of Computing and Information Systems

As large scale software development has become more collaborative, and software teams more globally distributed, several studies have explored how developer interaction influences software development outcomes. The emphasis so far has been largely on outcomes like defect count, the time to close modification requests etc. In the paper, we examine data from the Chromium project to understand how different aspects of developer discussion relate to the closure time of reviews. On the basis of analyzing reviews discussed by 2000+ developers, our results indicate that quicker closure of reviews owned by a developer relates to higher reception of information and insights …


Structured Learning From Heterogeneous Behavior For Social Identity Linkage, Siyuan Liu, Shuhui Wang, Feida Zhu Jul 2015

Structured Learning From Heterogeneous Behavior For Social Identity Linkage, Siyuan Liu, Shuhui Wang, Feida Zhu

Research Collection School Of Computing and Information Systems

Social identity linkage across different social media platforms is of critical importance to business intelligence by gaining from social data a deeper understanding and more accurate profiling of users. In this paper, we propose a solution framework, HYDRA, which consists of three key steps: (I) we model heterogeneous behavior by long-term topical distribution analysis and multi-resolution temporal behavior matching against high noise and information missing, and the behavior similarity are described by multi-dimensional similarity vector for each user pair; (II) we build structure consistency models to maximize the structure and behavior consistency on users' core social structure across different platforms, …


Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng Jun 2015

Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng

Research Collection School Of Computing and Information Systems

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill …


Efficient Reverse Top-K Boolean Spatial Keyword Queries On Road Networks, Yunjun Gao, Xu Qin, Baihua Zheng, Gang Chen May 2015

Efficient Reverse Top-K Boolean Spatial Keyword Queries On Road Networks, Yunjun Gao, Xu Qin, Baihua Zheng, Gang Chen

Research Collection School Of Computing and Information Systems

Reverse k nearest neighbor (RkNN) queries have a broad application base such as decision support, profile-based marketing, and resource allocation. Previous work on RkNN search does not take textual information into consideration or limits to the Euclidean space. In the real world, however, most spatial objects are associated with textual information and lie on road networks. In this paper, we introduce a new type of queries, namely, reverse top-k Boolean spatial keyword (RkBSK) retrieval, which assumes objects are on the road network and considers both spatial and textual information. Given a data set P on a road network and a …


Moving Average Reversion Strategy For On-Line Portfolio Selection, Bin Li, Steven C. H. Hoi, Doyen Sahoo, Zhi-Yong Liu May 2015

Moving Average Reversion Strategy For On-Line Portfolio Selection, Bin Li, Steven C. H. Hoi, Doyen Sahoo, Zhi-Yong Liu

Research Collection School Of Computing and Information Systems

On-line portfolio selection, a fundamental problem in computational finance, has attracted increasing interest from artificial intelligence and machine learning communities in recent years. Empirical evidence shows that stock's high and low prices are temporary and stock prices are likely to follow the mean reversion phenomenon. While existing mean reversion strategies are shown to achieve good empirical performance on many real datasets, they often make the single-period mean reversion assumption, which is not always satisfied, leading to poor performance in certain real datasets. To overcome this limitation, this article proposes a multiple-period mean reversion, or so-called "Moving Average Reversion" (MAR), and …


Best Upgrade Plans For Single And Multiple Source-Destination Pairs, Yimin Lin, Kyriakos Mouratidis Apr 2015

Best Upgrade Plans For Single And Multiple Source-Destination Pairs, Yimin Lin, Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

In this paper, we study Resource Constrained Best Upgrade Plan (BUP) computation in road network databases. Consider a transportation network (weighted graph) G where a subset of the edges are upgradable, i.e., for each such edge there is a cost, which if spent, the weight of the edge can be reduced to a specific new value. In the single-pair version of BUP, the input includes a source and a destination in G, and a budget B (resource constraint). The goal is to identify which upgradable edges should be upgraded so that the shortest path distance between source and …


Review Selection Using Micro-Reviews, Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas Apr 2015

Review Selection Using Micro-Reviews, Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas

Research Collection School Of Computing and Information Systems

Given the proliferation of review content, and the fact that reviews are highly diverse and often unnecessarily verbose, users frequently face the problem of selecting the appropriate reviews to consume. Micro-reviews are emerging as a new type of online review content in the social media. Micro-reviews are posted by users of check-in services such as Foursquare. They are concise (up to 200 characters long) and highly focused, in contrast to the comprehensive and verbose reviews. In this paper, we propose a novel mining problem, which brings together these two disparate sources of review content. Specifically, we use coverage of micro-reviews …


Joint Search By Social And Spatial Proximity, Kyriakos Mouratidis, Jing Li, Yu Tang, Nikos Mamoulis Mar 2015

Joint Search By Social And Spatial Proximity, Kyriakos Mouratidis, Jing Li, Yu Tang, Nikos Mamoulis

Research Collection School Of Computing and Information Systems

The diffusion of social networks introduces new challenges and opportunities for advanced services, especially so with their ongoing addition of location-based features. We show how applications like company and friend recommendation could significantly benefit from incorporating social and spatial proximity, and study a query type that captures these two-fold semantics. We develop highly scalable algorithms for its processing, and enhance them with elaborate optimizations. Finally, we use real social network data to empirically verify the efficiency and efficacy of our solutions.


Beyond Support And Confidence: Exploring Interestingness Measures For Rule-Based Specification Mining, Bui Tien Duy Le, David Lo Mar 2015

Beyond Support And Confidence: Exploring Interestingness Measures For Rule-Based Specification Mining, Bui Tien Duy Le, David Lo

Research Collection School Of Computing and Information Systems

Numerous rule-based specification mining approaches have been proposed in the literature. Many of these approaches analyze a set of execution traces to discover interesting usage rules, e.g., whenever lock() is invoked, eventually unlock() is invoked. These techniques often generate and enumerate a set of candidate rules and compute some interestingness scores. Rules whose interestingness scores are above a certain threshold would then be output. In past studies, two measures, namely support and confidence, which are well-known measures, are often used to compute these scores. However, aside from these two, many other interestingness measures have been proposed. It is thus unclear …


On Efficient K-Optimal-Location-Selection Query Processing In Metric Spaces, Yunjun Gao, Shuyao Qi, Lu Chen, Baihua Zheng, Xinhan Li Mar 2015

On Efficient K-Optimal-Location-Selection Query Processing In Metric Spaces, Yunjun Gao, Shuyao Qi, Lu Chen, Baihua Zheng, Xinhan Li

Research Collection School Of Computing and Information Systems

This paper studies the problem of k-optimal-location-selection (kOLS) retrieval in metric spaces. Given a set DA of customers, a set DB of locations, a constrained region R , and a critical distance dc, a metric kOLS (MkOLS) query retrieves k locations in DB that are outside R but have the maximal optimality scores. Here, the optimality score of a location l∈DB located outside R is defined as the number of the customers in DA that are inside R and meanwhile have their distances to l bounded by …


Bridging The Vocabulary Gap Between Health Seekers And Healthcare Knowledge, Liqiang Nie, Yiliang Zhao, Akbari Mohammad, Jialie Shen, Tat-Seng Chua Feb 2015

Bridging The Vocabulary Gap Between Health Seekers And Healthcare Knowledge, Liqiang Nie, Yiliang Zhao, Akbari Mohammad, Jialie Shen, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

The vocabulary gap between health seekers and providers has hindered the cross-system operability and the interuser reusability. To bridge this gap, this paper presents a novel scheme to code the medical records by jointly utilizing local mining and global learning approaches, which are tightly linked and mutually reinforced. Local mining attempts to code the individual medical record by independently extracting the medical concepts from the medical record itself and then mapping them to authenticated terminologies. A corpus-aware terminology vocabulary is naturally constructed as a byproduct, which is used as the terminology space for global learning. Local mining approach, however, may …