Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Databases and Information Systems

Dashboard Design Mining And Recommendation, Yanna Lin, Haotian Li, Aoyu Wu, Yong Wang, Huamin Qu Jan 2023

Dashboard Design Mining And Recommendation, Yanna Lin, Haotian Li, Aoyu Wu, Yong Wang, Huamin Qu

Research Collection School Of Computing and Information Systems

Dashboards, which comprise multiple views on a single display, help analyze and communicate multiple perspectives of data simultaneously. However, creating effective and elegant dashboards is challenging since it requires careful and logical arrangement and coordination of multiple visualizations. To solve the problem, we propose a data-driven approach for mining design rules from dashboards and automating dashboard organization. Specifically, we focus on two prominent aspects of the organization: , which describes the position, size, and layout of each view in the display space; and, which indicates the interaction between pairwise views. We build a new dataset containing 854 dashboards crawled online, …


Exploiting Reuse For Gpu Subgraph Enumeration, Wentiao Guo, Yuchen Li, Kian-Lee Tan Sep 2022

Exploiting Reuse For Gpu Subgraph Enumeration, Wentiao Guo, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Subgraph enumeration is important for many applications such as network motif discovery, community detection, and frequent subgraph mining. To accelerate the execution, recent works utilize graphics processing units (GPUs) to parallelize subgraph enumeration. The performances of these parallel schemes are dominated by the set intersection operations which account for up to $95\%$ of the total processing time. (Un)surprisingly, a significant portion (as high as $99\%$) of these operations is actually redundant, i.e., the same set of vertices is repeatedly encountered and evaluated. Therefore, in this paper, we seek to salvage and recycle the results of such operations to avoid repeated …


Data Mining Approach To The Detection Of Suicide In Social Media: A Case Study Of Singapore, Jane H. K. Seah, Kyong Jin Shim Dec 2018

Data Mining Approach To The Detection Of Suicide In Social Media: A Case Study Of Singapore, Jane H. K. Seah, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

In this research, we focus on the social phenomenon of suicide. Specifically, we perform social sensing on digital traces obtained from Reddit. We analyze the posts and comments in that are related to depression and suicide. We perform natural language processing to better understand different aspects of human life that relate to suicide.


On Analyzing Job Hop Behavior And Talent Flow Networks, Richard J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, Philips Kokoh Prasetyo Nov 2017

On Analyzing Job Hop Behavior And Talent Flow Networks, Richard J. Oentaryo, Xavier Jayaraj Siddarth Ashok, Ee-Peng Lim, Philips Kokoh Prasetyo

Research Collection School Of Computing and Information Systems

Analyzing job hopping behavior is important for theunderstanding of job preference and career progression of working individuals.When analyzed at the workforce population level, job hop analysis helps to gaininsights of talent flow and organization competition. Traditionally, surveysare conducted on job seekers and employers to study job behavior. While surveysare good at getting direct user input to specially designed questions, they areoften not scalable and timely enough to cope with fast-changing job landscape.In this paper, we present a data science approach to analyze job hops performedby about 490,000 working professionals located in a city using their publiclyshared profiles. We develop several …


Euclidean Co-Embedding Of Ordinal Data For Multi-Type Visualization, Dung D. Le, Hady W. Lauw May 2016

Euclidean Co-Embedding Of Ordinal Data For Multi-Type Visualization, Dung D. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Embedding deals with reducing the high-dimensional representation of data into a low-dimensional representation. Previous work mostly focuses on preserving similarities among objects. Here, not only do we explicitly recognize multiple types of objects, but we also focus on the ordinal relationships across types. Collaborative Ordinal Embedding or COE is based on generative modelling of ordinal triples. Experiments show that COE outperforms the baselines on objective metrics, revealing its capacity for information preservation for ordinal data.


The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar Oct 2015

The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar

Research Collection School Of Computing and Information Systems

As large scale software development has become more collaborative, and software teams more globally distributed, several studies have explored how developer interaction influences software development outcomes. The emphasis so far has been largely on outcomes like defect count, the time to close modification requests etc. In the paper, we examine data from the Chromium project to understand how different aspects of developer discussion relate to the closure time of reviews. On the basis of analyzing reviews discussed by 2000+ developers, our results indicate that quicker closure of reviews owned by a developer relates to higher reception of information and insights …


Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng Jun 2015

Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng

Research Collection School Of Computing and Information Systems

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill …


Ar-Miner: Mining Informative Reviews For Developers From Mobile App Marketplace, Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, Boshen Zhang Jun 2014

Ar-Miner: Mining Informative Reviews For Developers From Mobile App Marketplace, Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, Boshen Zhang

Research Collection School Of Computing and Information Systems

With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of number of users and downloads. App developers spend considerable effort on collecting and exploiting user feedback to improve user satisfaction, but suffer from the absence of effective user review analytics tools. To facilitate mobile app developers discover the most “informative” user reviews from a large and rapidly increasing pool of user reviews, we present “AR-Miner” — a novel computational framework for App Review Mining, which performs comprehensive analytics from raw user reviews by (i) first extracting informative user reviews by …


On Finding The Point Where There Is No Return: Turning Point Mining On Game Data, Wei Gong, Ee Peng Lim, Feida Zhu, Achananuparp Palakorn, David Lo Apr 2014

On Finding The Point Where There Is No Return: Turning Point Mining On Game Data, Wei Gong, Ee Peng Lim, Feida Zhu, Achananuparp Palakorn, David Lo

Research Collection School Of Computing and Information Systems

Gaming expertise is usually accumulated through playing or watching many game instances, and identifying critical moments in these game instances called turning points. Turning point rules (shorten as TPRs) are game patterns that almost always lead to some irreversible outcomes. In this paper, we formulate the notion of irreversible outcome property which can be combined with pattern mining so as to automatically extract TPRs from any given game datasets. We specifically extend the well-known PrefixSpan sequence mining algorithm by incorporating the irreversible outcome property. To show the usefulness of TPRs, we apply them to Tetris, a popular game. We mine …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Detecting Click Fraud In Online Advertising: A Data Mining Approach, Richard Oentaryo, Ee Peng Lim, Michael Finegold, David Lo, Feida Zhu, Clifton Phua, Eng-Yeow Cheu, Ghim-Eng Yap, Kelvin Sim, Kasun Perera, Bijay Neupane, Mustafa Faisal, Zeyar Aung, Wei Lee Woon, Wei Chen, Dhaval Patel, Daniel Berrar Jan 2014

Detecting Click Fraud In Online Advertising: A Data Mining Approach, Richard Oentaryo, Ee Peng Lim, Michael Finegold, David Lo, Feida Zhu, Clifton Phua, Eng-Yeow Cheu, Ghim-Eng Yap, Kelvin Sim, Kasun Perera, Bijay Neupane, Mustafa Faisal, Zeyar Aung, Wei Lee Woon, Wei Chen, Dhaval Patel, Daniel Berrar

Research Collection School Of Computing and Information Systems

Click fraud - the deliberate clicking on advertisements with no real interest on the product or service offered - is one of the most daunting problems in online advertising. Building an elective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from …


Modeling Interaction Features For Debate Side Clustering, Minghui Qiu, Liu Yang, Jing Jiang Oct 2013

Modeling Interaction Features For Debate Side Clustering, Minghui Qiu, Liu Yang, Jing Jiang

Research Collection School Of Computing and Information Systems

Online discussion forums are popular social media platforms for users to express their opinions and discuss controversial issues with each other. To automatically identify the sides/stances of posts or users from textual content in forums is an important task to help mine online opinions. To tackle the task, it is important to exploit user posts that implicitly contain support and dispute (interaction) information. The challenge we face is how to mine such interaction information from the content of posts and how to use them to help identify stances. This paper proposes a two-stage solution based on latent variable models: an …


Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim Sep 2013

Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Users face many choices on the Web when it comes to choosing which product to buy, which video to watch, etc. In making adoption decisions, users rely not only on their own preferences, but also on friends. We call the latter social correlation which may be caused by the homophily and social influence effects. In this paper, we focus on modeling social correlation on users’ item adoptions. Given a user-user social graph and an item-user adoption graph, our research seeks to answer the following questions: whether the items adopted by a user correlate to items adopted by her friends, and …


Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jun 2010

Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Spatio-temporal data concerning the movement of individuals over space and time contains latent information on the associations among these individuals. Sources of spatio-temporal data include usage logs of mobile and Internet technologies. This article defines a spatio-temporal event by the co-occurrences among individuals that indicate potential associations among them. Each spatio-temporal event is assigned a weight based on the precision and uniqueness of the event. By aggregating the weights of events relating two individuals, we can determine the strength of association between them. We conduct extensive experimentation to investigate both the efficacy of the proposed model as well as the …


Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang Aug 2006

Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang

Research Collection School Of Computing and Information Systems

In this paper, we investigate how deviation in evaluation activities may reveal bias on the part of reviewers and controversy on the part of evaluated objects. We focus on a 'data-centric approach' where the evaluation data is assumed to represent the ground truth'. The standard statistical approaches take evaluation and deviation at face value. We argue that attention should be paid to the subjectivity of evaluation, judging the evaluation score not just on 'what is being said' (deviation), but also on 'who says it' (reviewer) as well as on 'whom it is said about' (object). Furthermore, we observe that bias …


Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan Apr 2006

Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan

Research Collection School Of Computing and Information Systems

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning …


Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim Apr 2006

Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Mobile user data mining is a field that focuses on extracting interesting pattern and knowledge out from data generated by mobile users. Group pattern is a type of mobile user data mining method. In group pattern mining, group patterns from a given user movement database is found based on spatio-temporal distances. In this paper, we propose an improvement of efficiency using area method for locating mobile users and using sliding window for static group pattern mining. This reduces the complexity of valid group pattern mining problem. We support the use of static method, which uses areas and sliding windows instead …


Social Network Discovery By Mining Spatio-Temporal Events, Hady Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jul 2005

Social Network Discovery By Mining Spatio-Temporal Events, Hady Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Knowing patterns of relationship in a social network is very useful for law enforcement agencies to investigate collaborations among criminals, for businesses to exploit relationships to sell products, or for individuals who wish to network with others. After all, it is not just what you know, but also whom you know, that matters. However, finding out who is related to whom on a large scale is a complex problem. Asking every single individual would be impractical, given the huge number of individuals and the changing dynamics of relationships. Recent advancement in technology has allowed more data about activities of individuals …


Blocking Reduction Strategies In Hierarchical Text Classification, Ee Peng Lim, Aixin Sun, Wee-Keong Ng, Jaideep Srivastava Oct 2004

Blocking Reduction Strategies In Hierarchical Text Classification, Ee Peng Lim, Aixin Sun, Wee-Keong Ng, Jaideep Srivastava

Research Collection School Of Computing and Information Systems

One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. We propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, threshold reduction, restricted voting, and …


A Support-Ordered Trie For Fast Frequent Itemset Discovery, Ee Peng Lim, Yew-Kwong Woon, Wee-Keong Ng Jul 2004

A Support-Ordered Trie For Fast Frequent Itemset Discovery, Ee Peng Lim, Yew-Kwong Woon, Wee-Keong Ng

Research Collection School Of Computing and Information Systems

The importance of data mining is apparent with the advent of powerful data collection and storage tools; raw data is so abundant that manual analysis is no longer possible. Unfortunately, data mining problems are difficult to solve and this prompted the introduction of several novel data structures to improve mining efficiency. Here, we critically examine existing preprocessing data structures used in association rule mining for enhancing performance in an attempt to understand their strengths and weaknesses. Our analyses culminate in a practical structure called the SOTrielT (support-ordered trie itemset) and two synergistic algorithms to accompany it for the fast discovery …


Predictive Self-Organizing Networks For Text Categorization, Ah-Hwee Tan Apr 2001

Predictive Self-Organizing Networks For Text Categorization, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

This paper introduces a class of predictive self-organizing neural networks known as Adaptive Resonance Associative Map (ARAM) for classification of free-text documents. Whereas most sta- tistical approaches to text categorization derive classification knowledge based on training examples alone, ARAM performs supervised learn- ing and integrates user-defined classification knowledge in the form of IF-THEN rules. Through our experiments on the Reuters-21578 news database, we showed that ARAM performed reasonably well in mining categorization knowledge from sparse and high dimensional document feature space. In addition, ARAM predictive accuracy and learning efficiency can be improved by incorporating a set of rules derived from …