Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Databases and Information Systems

On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim Nov 2016

On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling …


Aspect-Based Helpfulness Prediction For Online Product Reviews, Yinfei Yang, Cen Chen, Forrest Sheng Bao Nov 2016

Aspect-Based Helpfulness Prediction For Online Product Reviews, Yinfei Yang, Cen Chen, Forrest Sheng Bao

Research Collection School Of Computing and Information Systems

Product reviews greatly influence purchase decisions in online shopping. A common burden of online shopping is that consumers have to search for the right answers through massive reviews, especially on popular products. Hence, estimating and predicting the helpfulness of reviews become important tasks to directly improve shopping experience. In this paper, we propose a new approach to helpfulness prediction by leveraging aspect analysis of reviews. Our hypothesis is that a helpful review will cover many aspects of a product at different emphasis levels. The first step to tackle this problem is to extract proper aspects. Because related products share common …


Plackett-Luce Regression Mixture Model For Heterogeneous Rankings, Maksim Tkachenko, Hady W. Lauw Oct 2016

Plackett-Luce Regression Mixture Model For Heterogeneous Rankings, Maksim Tkachenko, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Learning to rank is an important problem in many scenarios, such as information retrieval, natural language processing, recommender systems, etc. The objective is to learn a function that ranks a number of instances based on their features. In the vast majority of the learning to rank literature, there is an implicit assumption that the population of ranking instances are homogeneous, and thus can be modeled by a single central ranking function. In this work, we are concerned with learning to rank for a heterogeneous population, which may consist of a number of sub-populations, each of which may rank objects dierently. …


Unsupervised Multi-Graph Cross-Modal Hashing For Large-Scale Multimedia Retrieval, Liang Xie, Lei Zhu, Guoqi Chen Aug 2016

Unsupervised Multi-Graph Cross-Modal Hashing For Large-Scale Multimedia Retrieval, Liang Xie, Lei Zhu, Guoqi Chen

Research Collection School Of Computing and Information Systems

With the advance of internet and multimedia technologies, large-scale multi-modal representation techniques such as cross-modal hashing, are increasingly demanded for multimedia retrieval. In cross-modal hashing, three essential problems should be seriously considered. The first is that effective cross-modal relationship should be learned from training data with scarce label information. The second is that appropriate weights should be assigned for different modalities to reflect their importance. The last is the scalability of training process which is usually ignored by previous methods. In this paper, we propose Multi-graph Cross-modal Hashing (MGCMH) by comprehensively considering these three points. MGCMH is unsupervised method which …


Probabilistic Robust Route Recovery With Spatio-Temporal Dynamics, Hao Wu, Jiangyun Mao, Weiwei Sun, Baihua Zheng, Hanyuan Zhang, Ziyang Chen, Wei Wang Aug 2016

Probabilistic Robust Route Recovery With Spatio-Temporal Dynamics, Hao Wu, Jiangyun Mao, Weiwei Sun, Baihua Zheng, Hanyuan Zhang, Ziyang Chen, Wei Wang

Research Collection School Of Computing and Information Systems

Vehicle trajectories are one of the most important data in location-based services. The quality of trajectories directly affects the services. However, in the real applications, trajectory data are not always sampled densely. In this paper, we study the problem of recovering the entire route between two distant consecutive locations in a trajectory. Most existing works solve the problem without using those informative historical data or solve it in an empirical way. We claim that a data-driven and probabilistic approach is actually more suitable as long as data sparsity can be well handled. We propose a novel route recovery system in …


Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee Jul 2016

Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee

Research Collection School Of Computing and Information Systems

If you were to open your own cafe, would you not want to effortlessly identify the most suitable location to set up your shop? Choosing an optimal physical location is a critical decision for numerous businesses, as many factors contribute to the final choice of the location. In this paper, we seek to address the issue by investigating the use of publicly available Facebook Pages data-which include user "check-ins", types of business, and business locations-to evaluate a user-selected physical location with respect to a type of business. Using a dataset of 20,877 food businesses in Singapore, we conduct analysis of …


Word Clouds With Latent Variable Analysis For Visual Comparison Of Documents, Tuan M. V. Le, Hady W. Lauw Jul 2016

Word Clouds With Latent Variable Analysis For Visual Comparison Of Documents, Tuan M. V. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Word cloud is a visualization form for text that is recognized for its aesthetic, social, and analytical values. Here, we are concerned with deepening its analytical value for visual comparison of documents. To aid comparative analysis of two or more documents, users need to be able to perceive similarities and differences among documents through their word clouds. However, as we are dealing with text, approaches that treat words independently may impede accurate discernment of similarities among word clouds containing different words of related meanings. We therefore motivate the principle of displaying related words in a coherent manner, and propose to …


An Experimental Investigation Of Product Competition And Marketing In Social Networks, Cen Chen, Zhiling Guo, Shih-Fen Cheng, Hoong Chuin Lau Jun 2016

An Experimental Investigation Of Product Competition And Marketing In Social Networks, Cen Chen, Zhiling Guo, Shih-Fen Cheng, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

We conduct computational experiment using Facebook data to evaluate competing firms’ initial market seeding and subsequent targeted marketing strategies that influence consumers’ new product adoption decisions. We find that firms generally overspend their advertising budget in the market seeding phase. In the subsequent market advertising phase, a coupon strategy (equivalent to price discount) generally yields higher market share than the strategy of distributing free product samples. The effect is more significant when both price and product quality are low. We offer managerial insights into firms’ effective competition strategies for new product introduction in the presence of consumers’ word of mouth …


Context-Aware Advertisement Recommendation For High-Speed Social News Feeding, Yuchen Li, Dongxiang Zhang, Ziquan Lan, Kian-Lee Tan May 2016

Context-Aware Advertisement Recommendation For High-Speed Social News Feeding, Yuchen Li, Dongxiang Zhang, Ziquan Lan, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Social media advertising is a multi-billion dollar market and has become the major revenue source for Facebook and Twitter. To deliver ads to potentially interested users, these social network platforms learn a prediction model for each user based on their personal interests. However, as user interests often evolve slowly, the user may end up receiving repetitive ads. In this paper, we propose a context-aware advertising framework that takes into account the relatively static personal interests as well as the dynamic news feed from friends to drive growth in the ad click-through rate. To meet the real-time requirement, we first propose …


Temporal Kernel Descriptors For Learning With Time-Sensitive Patterns, Doyen Sahoo, Abhishek Sharma, Hoi, Steven C. H., Peilin Zhao May 2016

Temporal Kernel Descriptors For Learning With Time-Sensitive Patterns, Doyen Sahoo, Abhishek Sharma, Hoi, Steven C. H., Peilin Zhao

Research Collection School Of Computing and Information Systems

Detecting temporal patterns is one of the most prevalent challenges while mining data. Often, timestamps or information about when certain instances or events occurred can provide us with critical information to recognize temporal patterns. Unfortunately, most existing techniques are not able to fully extract useful temporal information based on the time (especially at different resolutions of time). They miss out on 3 crucial factors: (i) they do not distinguish between timestamp features (which have cyclical or periodic properties) and ordinary features; (ii) they are not able to detect patterns exhibited at different resolutions of time (e.g. different patterns at the …


Semantic Proximity Search On Graphs With Metagraph-Based Learning, Yuan Fang, Wenqing Lin, Vincent W. Zheng, Min Wu, Kevin Chen-Chuan Chang, Xiao-Li Li May 2016

Semantic Proximity Search On Graphs With Metagraph-Based Learning, Yuan Fang, Wenqing Lin, Vincent W. Zheng, Min Wu, Kevin Chen-Chuan Chang, Xiao-Li Li

Research Collection School Of Computing and Information Systems

Given ubiquitous graph data such as the Web and social networks, proximity search on graphs has been an active research topic. The task boils down to measuring the proximity between two nodes on a graph. Although most earlier studies deal with homogeneous or bipartite graphs only, many real-world graphs are heterogeneous with objects of various types, giving rise to different semantic classes of proximity. For instance, on a social network two users can be close for different reasons, such as being classmates or family members, which represent two distinct classes of proximity. Thus, it becomes inadequate to only measure a …


Euclidean Co-Embedding Of Ordinal Data For Multi-Type Visualization, Dung D. Le, Hady W. Lauw May 2016

Euclidean Co-Embedding Of Ordinal Data For Multi-Type Visualization, Dung D. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Embedding deals with reducing the high-dimensional representation of data into a low-dimensional representation. Previous work mostly focuses on preserving similarities among objects. Here, not only do we explicitly recognize multiple types of objects, but we also focus on the ordinal relationships across types. Collaborative Ordinal Embedding or COE is based on generative modelling of ordinal triples. Experiments show that COE outperforms the baselines on objective metrics, revealing its capacity for information preservation for ordinal data.


Joint Search By Social And Spatial Proximity [Extended Abstract], Kyriakos Mouratidis, Jing Li, Yu Tang, Nikos Mamoulis May 2016

Joint Search By Social And Spatial Proximity [Extended Abstract], Kyriakos Mouratidis, Jing Li, Yu Tang, Nikos Mamoulis

Research Collection School Of Computing and Information Systems

The diffusion of social networks introduces new challengesand opportunities for advanced services, especially so with their ongoingaddition of location-based features. We show how applications like company andfriend recommendation could significantly benefit from incorporating social andspatial proximity, and study a query type that captures these twofold semantics.We develop highly scalable algorithms for its processing, and use real socialnetwork data to empirically verify their efficiency and efficacy.


#Greysanatomy Vs. #Yankees: Demographics And Hashtag Use On Twitter, Jisun An, Ingmar Weber May 2016

#Greysanatomy Vs. #Yankees: Demographics And Hashtag Use On Twitter, Jisun An, Ingmar Weber

Research Collection School Of Computing and Information Systems

Demographics, in particular, gender, age, and race, are a key predictor of human behavior. Despite the significant effect that demographics plays, most scientific studies using online social media do not consider this factor, mainly due to the lack of such information. In this work, we use state-of-the-art face analysis software to infer gender, age, and race from profile images of 350K Twitter users from New York. For the period from November 1, 2014 to October 31, 2015, we study which hashtags are used by different demographic groups. Though we find considerable overlap for the most popular hashtags, there are also …


Semantic Visualization With Neighborhood Graph Regularization, Tuan Minh Van Le, Hady W. Lauw Apr 2016

Semantic Visualization With Neighborhood Graph Regularization, Tuan Minh Van Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances. We …


Top-K Dominating Queries On Incomplete Data, Xiaoye Miao, Yunjun Gao, Baihua Zheng, Gang Chen, Huiyong Cui Jan 2016

Top-K Dominating Queries On Incomplete Data, Xiaoye Miao, Yunjun Gao, Baihua Zheng, Gang Chen, Huiyong Cui

Research Collection School Of Computing and Information Systems

The top-k dominating (TKD) query returns the k objects that dominate the maximum number of objects in a given dataset. It combines the advantages of skyline and top-k queries, and plays an important role in many decision support applications. Incomplete data exists in a wide spectrum of real datasets, due to device failure, privacy preservation, data loss, and so on. In this paper, for the first time, we carry out a systematic study of TKD queries on incomplete data, which involves the data having some missing dimensional value(s). We formalize this problem, and propose a suite of efficient algorithms for …


A Study On Singapore Haze, Bingtian Dai, Kasthuri Jayarajah, Ee-Peng Lim, Archan Misra, Shriguru Nayak Jan 2016

A Study On Singapore Haze, Bingtian Dai, Kasthuri Jayarajah, Ee-Peng Lim, Archan Misra, Shriguru Nayak

Research Collection School Of Computing and Information Systems

In 2015, Singaporean have experienced one of the worse air pollution crises in history. With datasets from a well-known photo sharing social network, we analyze how this haze affects Singaporean's daily life. We will share our preliminary results in this paper.