Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Research Collection School Of Computing and Information Systems

2017

Articles 1 - 30 of 176

Full-Text Articles in Databases and Information Systems

On Modeling Sense Relatedness In Multi-Prototype Word Embedding, Yixin Cao, Juanzi Li, Jiaxin Shi, Zhiyuan Liu, Chengjiang Li Dec 2017

On Modeling Sense Relatedness In Multi-Prototype Word Embedding, Yixin Cao, Juanzi Li, Jiaxin Shi, Zhiyuan Liu, Chengjiang Li

Research Collection School Of Computing and Information Systems

To enhance the expression ability of distributional word representation learning model, many researchers tend to induce word senses through clustering, and learn multiple embedding vectors for each word, namely multi-prototype word embedding model. However, most related work ignores the relatedness among word senses which actually plays an important role. In this paper, we propose a novel approach to capture word sense relatedness in multi-prototype word embedding model. Particularly, we differentiate the original sense and extended senses of a word by introducing their global occurrence information and model their relatedness through the local textual context information. Based on the idea of …


Inferring Social Media Users’ Demographics From Profile Pictures: A Face++ Analysis On Twitter Users, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard J. Jansen Dec 2017

Inferring Social Media Users’ Demographics From Profile Pictures: A Face++ Analysis On Twitter Users, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

In this research, we evaluate the applicability of using facial recognition of social media account profile pictures to infer the demographic attributes of gender, race, and age of the account owners leveraging a commercial and well-known image service, specifically Face++. Our goal is to determine the feasibility of this approach for actual system implementation. Using a dataset of approximately 10,000 Twitter profile pictures, we use Face++ to classify this set of images for gender, race, and age. We determine that about 30% of these profile pictures contain identifiable images of people using the current state-of-the-art automated means. We then employ …


Leveraging Auxiliary Tasks For Document-Level Cross-Domain Sentiment Classification, Jianfei Yu, Jing Jiang Dec 2017

Leveraging Auxiliary Tasks For Document-Level Cross-Domain Sentiment Classification, Jianfei Yu, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper, we study domain adaptationwith a state-of-the-art hierarchicalneural network for document-level sentimentclassification. We first design a newauxiliary task based on sentiment scoresof domain-independent words. We thenpropose two neural network architecturesto respectively induce document embeddingsand sentence embeddings that workwell for different domains. When thesedocument and sentence embeddings areused for sentiment classification, we findthat with both pseudo and external sentimentlexicons, our proposed methods canperform similarly to or better than severalhighly competitive domain adaptationmethods on a benchmark dataset of productreviews.


Analyzing The E-Learning Video Environment Requirements Of Generation Z Students Using Echo360 Platform, Swapna Gottipati, Venky Shankararaman Dec 2017

Analyzing The E-Learning Video Environment Requirements Of Generation Z Students Using Echo360 Platform, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

As with any other generational cohort,Generation Z students have their own unique characteristics that influencetheir approach to learning process. They are the future workforce and severalefforts are undertaken by Government and education institutes to consider thecharacteristics of Gen-Z in developing the curriculum and teaching environmentsuitable for these students. E-learning plays a key role in students learningprocess and has been widely adopted by many education institutions. Inparticular, videos play a major role in the learning process of Gen-Zstudents. The purpose of this paper isto focus the on requirements of Gen-Z students and to provide suggestions forhow to create a e-learning video …


Disease Gene Classification With Metagraph Representations, Sezin Kircali Ata, Yuan Fang, Min Wu, Xiao-Li Li, Xiaokui Xiao Dec 2017

Disease Gene Classification With Metagraph Representations, Sezin Kircali Ata, Yuan Fang, Min Wu, Xiao-Li Li, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics …


Secure Server-Aided Top-K Monitoring, Yujue Wang, Hwee Hwa Pang, Yanjiang Yang, Xuhua Ding Dec 2017

Secure Server-Aided Top-K Monitoring, Yujue Wang, Hwee Hwa Pang, Yanjiang Yang, Xuhua Ding

Research Collection School Of Computing and Information Systems

In a data streaming model, a data owner releases records or documents to a set of users with matching interests, in such a way that the match in interest can be calculated from the correlation between each pair of document and user query. For scalability and availability reasons, this calculation is delegated to third-party servers, which gives rise to the need to protect the integrity and privacy of the documents and user queries. In this paper, we propose a server-aided data stream monitoring scheme (DSM) to address the aforementioned integrity and privacy challenges, so that the users are able to …


Using Data Analytics For Discovering Library Resource Insights: Case From Singapore Management University, Ning Lu, Rui Song, Dina Li Gwek Heng, Swapna Gottipati, Aaron Tay Dec 2017

Using Data Analytics For Discovering Library Resource Insights: Case From Singapore Management University, Ning Lu, Rui Song, Dina Li Gwek Heng, Swapna Gottipati, Aaron Tay

Research Collection School Of Computing and Information Systems

Library resources are critical in supporting teaching, research and learning processes. Several universities have employed online platforms and infrastructure for enabling the online services to students, faculty and staff. To provide efficient services by understanding and predicting user needs libraries are looking into the area of data analytics. Library analytics in Singapore Management University is the project committed to provide an interface for data-intensive project collaboration, while supporting one of the library’s key pillars on its commitment to collaborate on initiatives with SMU Communities and external groups. In this paper, we study the transaction logs for user behavior analysis that …


Who Are Your Users? Comparing Media Professionals' Preconception Of Users To Data-Driven Personas, Lene Nielsen, Soon-Gyu Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen Dec 2017

Who Are Your Users? Comparing Media Professionals' Preconception Of Users To Data-Driven Personas, Lene Nielsen, Soon-Gyu Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

One of the reasons for using personas is to align user understandings across project teams and sites. As part of a larger persona study, at Al Jazeera English (AJE), we conducted 16 qualitative interviews with media producers, the end users of persona descriptions. We asked the participants about their understanding of a typical AJE media consumer, and the variety of answers shows that the understandings are not aligned and are built on a mix of own experiences, own self, assumptions, and data given by the company. The answers are sometimes aligned with the data-driven personas and sometimes not. The end …


A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou Dec 2017

A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou

Research Collection School Of Computing and Information Systems

The density peak clustering (DPC) algorithm is designed to quickly identify intricate-shaped clusters with high dimensionality by finding high-density peaks in a non-iterative manner and using only one threshold parameter. However, DPC has certain limitations in processing low-density data points because it only takes the global data density distribution into account. As such, DPC may confine in forming low-density data clusters, or in other words, DPC may fail in detecting anomalies and borderline points. In this paper, we analyze the limitations of DPC and propose a novel density peak clustering algorithm to better handle low-density clustering tasks. Specifically, our algorithm …


Robust Human Activity Recognition Using Lesser Number Of Wearable Sensors, Di Wang, Edwin Candinegara, Junhui Hou, Ah-Hwee Tan, Chunyan Miao Dec 2017

Robust Human Activity Recognition Using Lesser Number Of Wearable Sensors, Di Wang, Edwin Candinegara, Junhui Hou, Ah-Hwee Tan, Chunyan Miao

Research Collection School Of Computing and Information Systems

In recent years, research on the recognition of human physical activities solely using wearable sensors has received more and more attention. Compared to other types of sensory devices such as surveillance cameras, wearable sensors are preferred in most activity recognition applications mainly due to their non-intrusiveness and pervasiveness. However, many existing activity recognition applications or experiments using wearable sensors were conducted in the confined laboratory settings using specifically developed gadgets. These gadgets may be useful for a small group of people in certain specific scenarios, but probably will not gain their popularity because they introduce additional costs and they are …


Leveraging The Trade-Off Between Accuracy And Interpretability In A Hybrid Intelligent System, Di Wang, Chai Quek, Ah-Hwee Tan, Chunyan Miao, Geok See Ng, You Zhou Dec 2017

Leveraging The Trade-Off Between Accuracy And Interpretability In A Hybrid Intelligent System, Di Wang, Chai Quek, Ah-Hwee Tan, Chunyan Miao, Geok See Ng, You Zhou

Research Collection School Of Computing and Information Systems

Neural Fuzzy Inference System (NFIS) is a widely adopted paradigm to develop a data-driven learning system. This hybrid system has been widely adopted due to its accurate reasoning procedure and comprehensible inference rules. Although most NFISs primarily focus on accuracy, we have observed an ever increasing demand on improving the interpretability of NFISs and other types of machine learning systems. In this paper, we illustrate how we leverage the trade-off between accuracy and interpretability in an NFIS called Genetic Algorithm and Rough Set Incorporated Neural Fuzzy Inference System (GARSINFIS). In a nutshell, GARSINFIS self-organizes its network structure with a small …


D-Watch: Embracing “Bad” Multipaths For Device-Free Localization With Cots Rfid Devices, Ju Wang, Jie Xiong, Hongbo Jiang, Xiaojiang Chen, Dingyi Fang Dec 2017

D-Watch: Embracing “Bad” Multipaths For Device-Free Localization With Cots Rfid Devices, Ju Wang, Jie Xiong, Hongbo Jiang, Xiaojiang Chen, Dingyi Fang

Research Collection School Of Computing and Information Systems

Device-free localization, which does not require any device attached to the target, is playing a critical role in many applications, such as intrusion detection, elderly monitoring and so on. This paper introduces D-Watch, a device-free system built on the top of low cost commodity-off-the-shelf RFID hardware. Unlike previous works which consider multipaths detrimental, D-Watch leverages the ''bad'' multipaths to provide a decimeter-level localization accuracy without offline training. D-Watch harnesses the angle-of-arrival information from the RFID tags' backscatter signals. The key intuition is that whenever a target blocks a signal's propagation path, the signal power experiences a drop which can be …


Using Teaching Cases For Achieving Bloom’S High-Order Cognitive Levels: An Application In Technically-Oriented Information Systems Course, Kar Way Tan Dec 2017

Using Teaching Cases For Achieving Bloom’S High-Order Cognitive Levels: An Application In Technically-Oriented Information Systems Course, Kar Way Tan

Research Collection School Of Computing and Information Systems

Case-teaching has been an attractive pedagogy method for bringing in real-world examples into the classroom. However, it is challenging to introduce cases to address high-order cognitive skills such as analyzing and creating new IT solutions in technically-oriented computing course. In this research, we present our experience in introducing three types of case studies -- Story-Telling case, Design-and-Problem-Solving case, and Create-Design-Implement case to a course in an undergraduate Information Systems programme. For each case study, we plan and map the learning objectives to address various cognitive levels in the revised Bloom’s Taxonomy. Using surveys conducted over two academic years, we show …


Btci: A New Framework For Identifying Congestion Cascades Using Bus Trajectory Data, Meng-Fen Chiang, Ee Peng Lim, Wang-Chien Lee, Agus Trisnajaya Kwee Dec 2017

Btci: A New Framework For Identifying Congestion Cascades Using Bus Trajectory Data, Meng-Fen Chiang, Ee Peng Lim, Wang-Chien Lee, Agus Trisnajaya Kwee

Research Collection School Of Computing and Information Systems

The knowledge of traffic health status is essential to the general public and urban traffic management. To identify congestion cascades, an important phenomenon of traffic health, we propose a Bus Trajectory based Congestion Identification (BTCI) framework that explores the anomalous traffic health status and structure properties of congestion cascades using bus trajectory data. BTCI consists of two main steps, congested segment extraction and congestion cascades identification. The former constructs path speed models from historical vehicle transitions and design a non-parametric Kernel Density Estimation (KDE) function to derive a measure of congestion score. The latter aggregates congested segments (i.e., those with …


Predicting Indoor Crowd Density Using Column-Structured Deep Neural Network, Akihito Sudo, Teck Hou (Deng Dehao) Teng, Hoong Chuin Lau, Yoshihide Sekimoto Nov 2017

Predicting Indoor Crowd Density Using Column-Structured Deep Neural Network, Akihito Sudo, Teck Hou (Deng Dehao) Teng, Hoong Chuin Lau, Yoshihide Sekimoto

Research Collection School Of Computing and Information Systems

This work proposes a deep neural network approach known as the column-structured deep neural network (COL-DNN-R) for predicting crowd density in an indoor environment using historical Wi-Fi traces of individual visitors. With a structure designed to minimize feature engineering, COL-DNN accepts raw features such as crowd density, opening and closing hours and peak visitor counts for extracting features. The extracted features are used by a regression model R for predicting the crowd densities. Standard regression models such as MLP, RF and SVM can be used as R. Experiments are performed to investigate the effect of feature representation and model structure …


Second-Order Online Active Learning And Its Applications, Shuji Hao, Jing Lu, Peilin Zhao, Chi Zhang, Steven C. H. Hoi, Chunyan Miao Nov 2017

Second-Order Online Active Learning And Its Applications, Shuji Hao, Jing Lu, Peilin Zhao, Chi Zhang, Steven C. H. Hoi, Chunyan Miao

Research Collection School Of Computing and Information Systems

The goal of online active learning is to learn predictive models from a sequence of unlabeled data given limited label querybudget. Unlike conventional online learning tasks, online active learning is considerably more challenging because of two reasons.Firstly, it is difficult to design an effective query strategy to decide when is appropriate to query the label of an incoming instance givenlimited query budget. Secondly, it is also challenging to decide how to update the predictive models effectively whenever the true labelof an instance is queried. Most existing approaches for online active learning are often based on a family of first-order online …


Color-Sketch Simulator: A Guide For Color-Based Visual Known-Item Search, Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, Chong-Wah Ngo Nov 2017

Color-Sketch Simulator: A Guide For Color-Based Visual Known-Item Search, Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

In order to evaluate the effectiveness of a color-sketch retrieval system for a given multimedia database, tedious evaluations involving real users are required as users are in the center of query sketch formulation. However, without any prior knowledge about the bottlenecks of the underlying sketch-based retrieval model, the evaluations may focus on wrong settings and thus miss the desired effect. Furthermore, users have usually no clues or recommendations to draw color-sketches effectively. In this paper, we aim at a preliminary analysis to identify potential bottlenecks of a flexible color-sketch retrieval model. We present a formal framework based on position-color feature …


Semvis: Semantic Visualization For Interactive Topical Analysis, Le Van Minh Tuan, Hady Wirawan Lauw Nov 2017

Semvis: Semantic Visualization For Interactive Topical Analysis, Le Van Minh Tuan, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

Exploratory analysis of a text corpus is an important task that can be aided by informative visualization. One spatially-oriented form of document visualization is a scatterplot, whereby every document is associated with a coordinate, and relationships among documents can be perceived through their spatial distances. Semantic visualization further infuses the visualization space with latent semantics, by incorporating a topic model that has a representation in the visualization space, allowing users to also perceive relationships between documents and topics spatially. We illustrate how a semantic visualization system called SemVis could be used to navigate a text corpus interactively and topically via …


Collaborative Topic Regression With Denoising Autoencoder For Content And Community Co-Representation, Trong T. Nguyen, Hady W. Lauw Nov 2017

Collaborative Topic Regression With Denoising Autoencoder For Content And Community Co-Representation, Trong T. Nguyen, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Personalized recommendation of items frequently faces scenarios where we have sparse observations on users' adoption of items. In the literature, there are two promising directions. One is to connect sparse items through similarity in content. The other is to connect sparse users through similarity in social relations. We seek to integrate both types of information, in addition to the adoption information, within a single integrated model. Our proposed method models item content via a topic model, and user communities via an autoencoder model, while bridging a user's community-based preference to her topic-based preference. Experiments on public real-life data showcase the …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifying true values of data items of interest from conflicting multi-sourced data. Although considerable research efforts have been conducted on this topic, existing approaches generally assume every data item has exactly one true value, which fails to reflect the real world where data items with multiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items. SourceVote models the endorsement relations among sources by quantifying their two-sided inter-source agreements. In particular, two graphs are constructed to model inter-source relations. Then two aspects …


Tweet Geolocation: Leveraging Location, User And Peer Signals, Wen-Haw Chong, Ee Peng Lim Nov 2017

Tweet Geolocation: Leveraging Location, User And Peer Signals, Wen-Haw Chong, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Which venue is a tweet posted from? We referred this as fine-grained geolocation. To solve this problem effectively, we develop novel techniques to exploit each posting user's content history. This is motivated by our finding that most users do not share their visitation history, but have ample content history from tweet posts. We formulate fine-grained geolocation as a ranking problem whereby given a test tweet, we rank candidate venues. We propose several models that leverage on three types of signals from locations, users and peers. Firstly, the location signals are words that are indicative of venues. We propose a location-indicative …


Modeling Check-In Behavior With Geographical Neighborhood Influence Of Venues, Thanh Nam Doan, Ee Peng Lim Nov 2017

Modeling Check-In Behavior With Geographical Neighborhood Influence Of Venues, Thanh Nam Doan, Ee Peng Lim

Research Collection School Of Computing and Information Systems

With many users adopting location-based social networks (LBSNs) to share their daily activities, LBSNs become a gold mine for researchers to study human check-in behavior. Modeling such behavior can benefit many useful applications such as urban planning and location-aware recommender systems. Unlike previous studies [4,6,12,17] that focus on the effect of distance on users checking in venues, we consider two venue-specific effects of geographical neighborhood influence, namely, spatial homophily and neighborhood competition. The former refers to the fact that venues share more common features with their spatial neighbors, while the latter captures the rivalry of a venue and its nearby …


Large Scale Kernel Methods For Online Auc Maximization, Yi Ding, Chenghao Liu, Peilin Zhao, Steven C. H. Hoi Nov 2017

Large Scale Kernel Methods For Online Auc Maximization, Yi Ding, Chenghao Liu, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Learning to optimize AUC performance for classifying label imbalanced data in online scenarios has been extensively studied in recent years. Most of the existing work has attempted to address the problem directly in the original feature space, which may not suitable for non-linearly separable datasets. To solve this issue, some kernel-based learning methods are proposed for non-linearly separable datasets. However, such kernel approaches have been shown to be inefficient and failed to scale well on large scale datasets in practice. Taking this cue, in this work, we explore the use of scalable kernel-based learning techniques as surrogates to existing approaches: …


Indexable Bayesian Personalized Ranking For Efficient Top-K Recommendation, Dung D. Le, Hady W. Lauw Nov 2017

Indexable Bayesian Personalized Ranking For Efficient Top-K Recommendation, Dung D. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Top-k recommendation seeks to deliver a personalized recommendation list of k items to a user. The dual objectives are (1) accuracy in identifying the items a user is likely to prefer, and (2) efficiency in constructing the recommendation list in real time. One direction towards retrieval efficiency is to formulate retrieval as approximate k nearest neighbor (kNN) search aided by indexing schemes, such as locality-sensitive hashing, spatial trees, and inverted index. These schemes, applied on the output representations of recommendation algorithms, speed up the retrieval process by automatically discarding a large number of potentially irrelevant items when given a user …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifyingtrue values of data items of interest from conflicting multi-sourceddata. Although considerable research efforts have been conducted on thistopic, existing approaches generally assume every data item has exactlyone true value, which fails to reflect the real world where data items withmultiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items.SourceVote models the endorsement relations among sources by quantifyingtheir two-sided inter-source agreements. In particular, two graphs areconstructed to model inter-source relations. Then two aspects of sourcereliability are derived from these graphs and …


A Fast Trajectory Outlier Detection Approach Via Driving Behavior Modeling, Hao Wu, Weiwei Sun, Baihua Zheng Nov 2017

A Fast Trajectory Outlier Detection Approach Via Driving Behavior Modeling, Hao Wu, Weiwei Sun, Baihua Zheng

Research Collection School Of Computing and Information Systems

Trajectory outlier detection is a fundamental building block for many location-based service (LBS) applications, with a large application base. We dedicate this paper on detecting the outliers from vehicle trajectories efficiently and effectively. In addition, we want our solution to be able to issue an alarm early when an outlier trajectory is only partially observed (i.e., the trajectory has not yet reached the destination). Most existing works study the problem on general Euclidean trajectories and require accesses to the historical trajectory database or computations on the distance metric that are very expensive. Furthermore, few of existing works consider some specific …


Answerbot: Automated Generation Of Answer Summary To Developers’ Technical Questions, Bowen Xu, Zhenchang Xing, Xin Xia, David Lo Nov 2017

Answerbot: Automated Generation Of Answer Summary To Developers’ Technical Questions, Bowen Xu, Zhenchang Xing, Xin Xia, David Lo

Research Collection School Of Computing and Information Systems

The prevalence of questions and answers on domain-specific Q&A sites like Stack Overflow constitutes a core knowledge asset for software engineering domain. Although search engines can return a list of questions relevant to a user query of some technical question, the abundance of relevant posts and the sheer amount of information in them makes it difficult for developers to digest them and find the most needed answers to their questions. In this work, we aim to help developers who want to quickly capture the key points of several answer posts relevant to a technical question before they read the details …


Guest Editor's Introduction To The Special Issue On Source Code Analysis And Manipulation (Scam 2015), Foutse Khomh, David Lo, Michael W. Godfrey Nov 2017

Guest Editor's Introduction To The Special Issue On Source Code Analysis And Manipulation (Scam 2015), Foutse Khomh, David Lo, Michael W. Godfrey

Research Collection School Of Computing and Information Systems

We are happy to introduce you to this special issue that presents selected papers from the 15th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2015). SCAM is a leading conference that brings together researchers and practitioners working on theory, techniques, and applications that concern analysis and/or manipulation of the source code of computer systems. While much attention in the wider software engineering community is properly directed towards other aspects of systems development and evolution, such as specification, design, and requirements engineering, it is the source code that contains the only precise description of the behavior of …


Highly Efficient Mining Of Overlapping Clusters In Signed Weighted Networks, Tuan-Anh Hoang, Ee-Peng Lim Nov 2017

Highly Efficient Mining Of Overlapping Clusters In Signed Weighted Networks, Tuan-Anh Hoang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

In many practical contexts, networks are weighted as their links are assigned numerical weights representing relationship strengths or intensities of inter-node interaction. Moreover, the links' weight can be positive or negative, depending on the relationship or interaction between the connected nodes. The existing methods for network clustering however are not ideal for handling very large signed weighted networks. In this paper, we present a novel method called LPOCSIN (short for "Linear Programming based Overlapping Clustering on Signed Weighted Networks") for efficient mining of overlapping clusters in signed weighted networks. Different from existing methods that rely on computationally expensive cluster cohesiveness …


Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao Nov 2017

Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao

Research Collection School Of Computing and Information Systems

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective …