Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 28 of 28

Full-Text Articles in Databases and Information Systems

Extracting Interest Tags From Twitter User Biographies, Ying Ding, Jing Jiang Dec 2014

Extracting Interest Tags From Twitter User Biographies, Ying Ding, Jing Jiang

Research Collection School Of Computing and Information Systems

Twitter, one of the most popular social media platforms, has been studied from different angles. One of the important sources of information in Twitter is users’ biographies, which are short self-introductions written by users in free form. Biographies often describe users’ background and interests. However, to the best of our knowledge, there has not been much work trying to extract information from Twitter biographies. In this work, we study how to extract information revealing users’ personal interests from Twitter biographies. A sequential labeling model is trained with automatically constructed labeled data. The popular patterns expressing user interests are extracted and …


Historical Traffic-Tolerant Paths In Road Networks, Pui Hang Li, Man Lung Yiu, Kyriakos Mouratidis Nov 2014

Historical Traffic-Tolerant Paths In Road Networks, Pui Hang Li, Man Lung Yiu, Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

Historical traffic information is valuable for transportation analysis and planning, as well as for route search services. In view of these applications, we propose the k traffic-tolerant paths problem (TTP) on road networks, which takes a source-destination pair and historical traffic information as input, and returns k paths that minimize the aggregate (historical) travel time. Unlike the shortest path problem, the TTP problem has a combinatorial search space that renders the optimal solution expensive to compute. We propose an exact algorithm and a heuristic algorithm for this problem. Experiments on real traffic data demonstrate the effectiveness of TTP paths and …


The Evolution Of Research On Multimedia Travel Guide Search And Recommender Systems, Junge Shen, Zhiyong Cheng, Jialie Shen, Tao Mei, Xinbo Gao Nov 2014

The Evolution Of Research On Multimedia Travel Guide Search And Recommender Systems, Junge Shen, Zhiyong Cheng, Jialie Shen, Tao Mei, Xinbo Gao

Research Collection School Of Computing and Information Systems

The importance of multimedia travel guide search and recommender systems has led to a substantial amount of research spanning different computer science and information system disciplines in recent years. The five core research streams we identify here incorporate a few multimedia computing and information retrieval problems that relate to the alternative perspectives of algorithm design for optimizing search/recommendation quality and different methodological paradigms to assess system performance at large scale. They include (1) query analysis, (2) diversification based on different criteria, (3) ranking and reranking, (4) personalization and (5) evaluation. Based on a comprehensive discussion and analysis of these streams, …


On Joint Modeling Of Topical Communities And Personal Interest In Microblogs, Tuan-Anh Hoang, Ee Peng Lim Nov 2014

On Joint Modeling Of Topical Communities And Personal Interest In Microblogs, Tuan-Anh Hoang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In this paper, we propose the Topical Communities and Personal Interest (TCPI) model for simultaneously modeling topics, topical communities, and users’ topical interests in microblogging data. TCPI considers different topical communities while differentiating users’ personal topical interests from those of topical communities, and learning the dependence of each user on the affiliated communities to generate content. This makes TCPI different from existing models that either do not consider the existence of multiple topical communities, or do not differentiate between personal and community’s topical interests. Our experiments on two Twitter datasets show that TCPI can effectively mine the representative topics for …


Entity Linking On Microblogs With Spatial And Temporal Signals, Yuan Fang, Ming-Wei Chang Oct 2014

Entity Linking On Microblogs With Spatial And Temporal Signals, Yuan Fang, Ming-Wei Chang

Research Collection School Of Computing and Information Systems

Microblogs present an excellent opportunity for monitoring and analyzing world happenings. Given that words are often ambiguous, entity linking becomes a crucial step towards understanding microblogs. In this paper, we re-examine the problem of entity linking on microblogs. We first observe that spatiotemporal (i.e., spatial and temporal) signals play a key role, but they are not utilized in existing approaches. Thus, we propose a novel entity linking framework that incorporates spatiotemporal signals through a weakly supervised process. Using entity annotations1 on real-world data, our experiments show that the spatiotemporal model improves F1 by more than 10 points over existing systems. …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam Oct 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


Press: A Novel Framework Of Trajectory Compression In Road Networks, Renchu Song, Weiwei Sun, Baihua Zheng, Yu Zheng Sep 2014

Press: A Novel Framework Of Trajectory Compression In Road Networks, Renchu Song, Weiwei Sun, Baihua Zheng, Yu Zheng

Research Collection School Of Computing and Information Systems

Location data becomes more and more important. In this paper, we focus on the trajectory data, and propose a new framework, namely PRESS (Paralleled Road-Network-Based Trajectory Compression), to effectively compress trajectory data under road network constraints. Different from existing work, PRESS proposes a novel representation for trajectories to separate the spatial representation of a trajectory from the temporal representation, and proposes a Hybrid Spatial Compression (HSC) algorithm and error Bounded Temporal Compression (BTC) algorithm to compress the spatial and temporal information of trajectories respectively. PRESS also supports common spatial-temporal queries without fully decompressing the data. Through an extensive experimental study …


The Use Of Geospatial Clustering In Analysing Health Risk Profile, Sue-Mae Yeo, Tin Seong Kam, Kai Xin Thia, Dan Wu Sep 2014

The Use Of Geospatial Clustering In Analysing Health Risk Profile, Sue-Mae Yeo, Tin Seong Kam, Kai Xin Thia, Dan Wu

Research Collection School Of Computing and Information Systems

Background & Hypothesis: The first law of geography states that “everything is related to everything else, but near things are more related than distant things”. This study aims to demonstrate how local indicator of spatial association (LISA) statistics are used to group patients with similar chronic diseases into natural clusters of hotspots found within northern Singapore by incorporating the proximity of their home locations explicitly. Methods: Anonymised chronic patient data collected from Khoo Teck Puat Hospital in 2013 were used for analyses. The data was mapped based on patients' residential addresses. A layer of hexagonal grid objects, each with a …


Clear: A Real-Time Online Observatory For Bursty And Viral Events, Runquan Xie, Feida Zhu, Hui Ma, Wei Xie, Chen Lin Sep 2014

Clear: A Real-Time Online Observatory For Bursty And Viral Events, Runquan Xie, Feida Zhu, Hui Ma, Wei Xie, Chen Lin

Research Collection School Of Computing and Information Systems

We describe our demonstration of CLEar (Clairaudient Ear), a real-time online platform for detecting, monitoring, summarizing, contextualizing and visualizing bursty and viral events, those triggering a sudden surge of public interest and going viral on micro-blogging platforms. This task is challenging for existing methods as they either use complicated topic models to analyze topics in a off-line manner or define temporal structure of fixed granularity on the data stream for online topic learning, leaving them hardly scalable for real-time stream like that of Twitter. In this demonstration of CLEar, we present a three-stage system: First, we show …


A Study Of Age Gaps Between Online Friends, Lizi Liao, Jing Jiang, Ee Peng Lim, Heyan Huang Sep 2014

A Study Of Age Gaps Between Online Friends, Lizi Liao, Jing Jiang, Ee Peng Lim, Heyan Huang

Research Collection School Of Computing and Information Systems

User attribute extraction on social media has gain considerable attention, while existing methods are mostly supervised which suffer great diffi- culty in insufficient gold standard data. In this paper, we validate a strong hypothesis based on homophily and adapt it to ensure the certainty of user attribute we extracted via weakly supervised propagation. Homophily, the theory which states that people who are similar tend to become friends, has been well studied in the setting of online social networks. When we focus on age attribute, based on this theory, online friends tend to have similar age. In this work, we take …


An Exploratory Study On Software Microblogger Behaviors, Yuan Tian, David Lo Sep 2014

An Exploratory Study On Software Microblogger Behaviors, Yuan Tian, David Lo

Research Collection School Of Computing and Information Systems

Microblogging services are growing rapidly in the recent years. Twitter, one of the most popular microblogging sites, has gained more than 500 millions users. Thousands of developers are also using Twitter to communicate with one another and microblog about software-related topics such as programming languages, code libraries, etc. Understanding the behaviors of software microbloggers is one of the needed first steps toward building automated tools to encourage software microblogging activities and harness software microblogging to improve various software engineering activities. In this paper, we investigate the behaviors of software microbloggers in terms of their microblogging frequency, generated contents, and interactions …


Interestingness-Driven Diffussion Process Summarization In Dynamic Networks, Qiang Qu, Siyuan Liu, Christian Jensen, Feida Zhu, Christos Faloutsos Sep 2014

Interestingness-Driven Diffussion Process Summarization In Dynamic Networks, Qiang Qu, Siyuan Liu, Christian Jensen, Feida Zhu, Christos Faloutsos

Research Collection School Of Computing and Information Systems

The widespread use of social networks enables the rapid diffusion of information, e.g., news, among users in very large communities. It is a substantial challenge to be able to observe and understand such diffusion processes, which may be modeled as networks that are both large and dynamic. A key tool in this regard is data summarization. However, few existing studies aim to summarize graphs/networks for dynamics. Dynamic networks raise new challenges not found in static settings, including time sensitivity and the needs for online interestingness evaluation and summary traceability, which render existing techniques inapplicable. We study the topic of dynamic …


Sharing Political News: The Balancing Act Of Intimacy And Socialization In Selective Exposure, Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft Sep 2014

Sharing Political News: The Balancing Act Of Intimacy And Socialization In Selective Exposure, Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft

Research Collection School Of Computing and Information Systems

One might think that, compared to traditional media, social media sites allow people to choose more freely what to read and what to share, especially for politically oriented news. However, reading and sharing habits originate from deeply ingrained behaviors that might be hard to change. To test the extent to which this is true, we propose a Political News Sharing (PoNS) model that holistically captures four key aspects of social psychology: gratification, selective exposure, socialization, and trust & intimacy. Using real instances of political news sharing in Twitter, we study the predictive power of these features. As one might expect, …


On Macro And Micro Exploration Of Hashtag Diffusion In Twitter, Yazhe Wang, Baihua Zheng Aug 2014

On Macro And Micro Exploration Of Hashtag Diffusion In Twitter, Yazhe Wang, Baihua Zheng

Research Collection School Of Computing and Information Systems

This exploratory work studies hashtag diffusion in Twitter. The analysis is conducted from two aspects. From the macro perspective, we study general properties of hashtag diffusion, and classify hashtags into three main classes based on their temporal dynamics referred as 'single spike', 'multi-spikes', and 'fluctuation', and find that each of these classes has some unique characteristics. From the micro perspective, we investigate individual diffusion.We adopt Edelman's 'topology of influence' theory to identify four type of users with different influence levels in diffusion based on their dynamic retweet behaviors. The results of our study are useful for gaining more insights of …


Urban Planning Process: Can Technology Enhance Participatory Communication?, Rojin Vishkaie, Richard Levy, Anthony Tang Aug 2014

Urban Planning Process: Can Technology Enhance Participatory Communication?, Rojin Vishkaie, Richard Levy, Anthony Tang

Research Collection School Of Computing and Information Systems

Oftentimes, within the urban planning process, urban planners and GIS experts must work together using desktop Computer-Aided Design (CAD) and Geographic Information System (GIS). However, participatory communication and visualization which are important in the urban planning process, are not a central focus in the design of current computer-aided planning technologies. This study tends to provide an understanding of technological challenges and complexities urban planners and GIS experts encounter while engaging in a participatory environment during the urban planning process. This study also explores the perceptions of urban planners and GIS experts about the potential impact and usefulness of interactive surfaces …


Diversified Social Influence Maximization, Fangshuang Tang, Qi Liu, Hengshu Zhu, Enhong Chen, Feida Zhu Aug 2014

Diversified Social Influence Maximization, Fangshuang Tang, Qi Liu, Hengshu Zhu, Enhong Chen, Feida Zhu

Research Collection School Of Computing and Information Systems

For better viral marketing, there has been a lot of research on social influence maximization. However, the problem that who is influenced and how diverse the influenced population is, which is important in real-world marketing, has largely been neglected. To that end, in this paper, we propose to consider the magnitude of influence and the diversity of the influenced crowd simultaneously. Specifically, we formulate it as an optimization problem, i.e., diversified social influence maximization. First, we present a general framework for this problem, under which we construct a class of diversity measures to quantify the diversity of the influenced crowd. …


Generating Supplementary Travel Guides From Social Media, Liu Yang, Jing Jiang, Lifu Huang, Minghui Qiu, Lizi Liao Aug 2014

Generating Supplementary Travel Guides From Social Media, Liu Yang, Jing Jiang, Lifu Huang, Minghui Qiu, Lizi Liao

Research Collection School Of Computing and Information Systems

In this paper we study how to summarize travel-related information in forum threads to generate supplementary travel guides. Such summaries presumably can provide additional and more up-to-date information to tourists. Existing multi-document summarization methods have limitations for this task because (1) they do not generate structured summaries but travel guides usually follow a certain template, and (2) they do not put emphasis on named entities but travel guides often recommend points of interest to travelers. To overcome these limitations, we propose to use a latent variable model to align forum threads with the section structure of well-written travel guides. The …


Influences Of Influential Users: An Empirical Study Of Music Social Network, Jing Ren, Zhiyong Cheng, Jialie Shen, Feida Zhu Jul 2014

Influences Of Influential Users: An Empirical Study Of Music Social Network, Jing Ren, Zhiyong Cheng, Jialie Shen, Feida Zhu

Research Collection School Of Computing and Information Systems

Influential user can play a crucial role in online social networks. This paper documents an empirical study aiming at exploring the effects of influential users in the context of music social network. To achieve this goal, music diffusion graph is developed to model how information propagates over network. We also propose a heuristic method to measure users' influences. Using the real data from Last. fm, our empirical test demonstrates key effects of influential users and reveals limitations of existing influence identification/characterization schemes.


Lifetime Lexical Variation In Social Media, Lizi Liao, Jing Jiang, Ying Ding, Heyan Huang, Ee Peng Lim Jul 2014

Lifetime Lexical Variation In Social Media, Lizi Liao, Jing Jiang, Ying Ding, Heyan Huang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

As the rapid growth of online social media attracts a large number of Internet users, the large volume of content generated by these users also provides us with an opportunity to study the lexical variation of people of different ages. In this paper, we present a latent variable model that jointly models the lexical content of tweets and Twitter users’ ages. Our model inherently assumes that a topic has not only a word distribution but also an age distribution. We propose a Gibbs-EM algorithm to perform inference on our model. Empirical evaluation shows that our model can learn meaningful age-specific …


Predicting The Popularity Of Web 2.0 Items Based On User Comments, Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, Kazunari Sugiyama Jul 2014

Predicting The Popularity Of Web 2.0 Items Based On User Comments, Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, Kazunari Sugiyama

Research Collection School Of Computing and Information Systems

In the current Web 2.0 era, the popularity of Web resources fluctuates ephemerally, based on trends and social interest. As a result, content-based relevance signals are insufficient to meet users' constantly evolving information needs in searching for Web 2.0 items. Incorporating future popularity into ranking is one way to counter this. However, predicting popularity as a third party (as in the case of general search engines) is difficult in practice, due to their limited access to item view histories. To enable popularity prediction externally without excessive crawling, we propose an alternative solution by leveraging user comments, which are more accessible …


On Predicting Religion Labels In Microblogging Networks, Minh Thap Nguyen, Ee Peng Lim Jul 2014

On Predicting Religion Labels In Microblogging Networks, Minh Thap Nguyen, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Religious belief plays an important role in how people behave, influencing how they form preferences, interpret events around them, and develop relationships with others. Traditionally, the religion labels of user population are obtained by conducting a large scale census study. Such an approach is both high cost and time consuming. In this paper, we study the problem of predicting users' religion labels using their microblogging data. We formulate religion label prediction as a classification task, and identify content, structure and aggregate features considering their self and social variants for representing a user. We introduce the notion of representative user to …


Socio-Physical Analytics: Challenges & Opportunities, Archan Misra, Kasthuri Jayarajah, Shriguru Nayak, Philips Kokoh Prasetyo, Ee-Peng Lim Jun 2014

Socio-Physical Analytics: Challenges & Opportunities, Archan Misra, Kasthuri Jayarajah, Shriguru Nayak, Philips Kokoh Prasetyo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

In this paper, we argue for expanded research into an area called Socio-Physical Analytics, that focuses on combining the behavioral insight gained from mobile-sensing based monitoring of physical behavior with the inter-personal relationships and preferences deduced from online social networks. We highlight some of the research challenges in combining these heterogeneous data sources and then describe some examples of our ongoing work (based on real-world data being collected at SMU) that illustrate two aspects of socio-physical analytics: (a) how additional demographic and online analytics based attributes can potentially provide better insights into the preferences and behaviors of individuals or groups …


Online Community Transition Detection, Biying Tan, Feida Zhu, Qiang Qu, Siyuan Liu Jun 2014

Online Community Transition Detection, Biying Tan, Feida Zhu, Qiang Qu, Siyuan Liu

Research Collection School Of Computing and Information Systems

Mining user behavior patterns in social networks is of great importance in user behavior analysis, targeted marketing, churn prediction and other applications. However, less effort has been made to study the evolution of user behavior in social communities. In particular, users join and leave communities over time. How to automatically detect the online community transitions of individual users is a research problem of immense practical value yet with great technical challenges. In this paper, we propose an algorithm based on the Minimum Description Length (MDL) principle to trace the evolution of community transition of individual users, adaptive to the noisy …


An Air Index For Spatial Query Processing In Road Networks, Weiwei Sun, Chunan Chen, Baihua Zheng, Chong Chen, Peng Liu Jun 2014

An Air Index For Spatial Query Processing In Road Networks, Weiwei Sun, Chunan Chen, Baihua Zheng, Chong Chen, Peng Liu

Research Collection School Of Computing and Information Systems

Spatial queries such as range query and kNN query in road networks have received a growing number of attention in real life. Considering the large population of the users and the high overhead of network distance computation, it is extremely important to guarantee the efficiency and scalability of query processing. Motivated by the scalable and secure properties of wireless broadcast model, this paper presents an air index called Network Partition Index (NPI) to support efficient spatial query processing in road networks via wireless broadcast. The main idea is to partition the road network into a number of regions and then …


Do You Know The Speaker?: An Online Experiment With Authority Messages On Event Websites, Kwan-Hui Lim, Binyan Jiang, Ee Peng Lim, Achananuparp Palakorn Apr 2014

Do You Know The Speaker?: An Online Experiment With Authority Messages On Event Websites, Kwan-Hui Lim, Binyan Jiang, Ee Peng Lim, Achananuparp Palakorn

Research Collection School Of Computing and Information Systems

With the widespread adoption of the Web, many companies and organizations have established websites that provide information and support online transactions (e.g., buying products or viewing content). Unfortunately, users have limited attention to spare for interacting with online sites. Hence, it is of utmost importance to design sites that attract user attention and effectively guide users to the product or content items they like. Thus, we propose a novel and scalable experimentation approach to evaluate the effectiveness of online site designs. Our case study focuses on the effects of an authority message on visitors' browsing behavior on workshop and seminar …


On Modeling Community Behaviors And Sentiments In Microblogging, Tuan Anh Hoang, William Cohen, Ee Peng Lim Apr 2014

On Modeling Community Behaviors And Sentiments In Microblogging, Tuan Anh Hoang, William Cohen, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In this paper, we propose the CBS topic model, a probabilistic graphical model, to derive the user communities in microblogging networks based on the sentiments they express on their generated content and behaviors they adopt. As a topic model, CBS can uncover hidden topics and derive user topic distribution. In addition, our model associates topic-specific sentiments and behaviors with each user community. Notably, CBS has a general framework that accommodates multiple types of behaviors simultaneously. Our experiments on two Twitter datasets show that the CBS model can effectively mine the representative behaviors and emotional topics for each community. We also …


Recurrent Chinese Restaurant Process With A Duration-Based Discount For Event Identification From Twitter, Qiming Diao, Jing Jiang Apr 2014

Recurrent Chinese Restaurant Process With A Duration-Based Discount For Event Identification From Twitter, Qiming Diao, Jing Jiang

Research Collection School Of Computing and Information Systems

Due to the fast development of social media on the Web, Twitter has become one of the major platforms for people to express themselves. Because of the wide adoption of Twitter, events like breaking news and release of popular videos can easily catch people’s attention and spread rapidly on Twitter, and the number of relevant tweets approximately reflects the impact of an event. Event identification and analysis on Twitter has thus become an important task. Recently the Recurrent Chinese Restaurant Process (RCRP) has been successfully used for event identification from news streams and news-centric social media streams. However, these models …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Tin Seong Kam, Roy Ka Wei Lee Mar 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Tin Seong Kam, Roy Ka Wei Lee

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …