Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Research Collection School Of Computing and Information Systems

2021

Articles 1 - 30 of 228

Full-Text Articles in Databases and Information Systems

A Fine-Grained Attribute Based Data Retrieval With Proxy Re-Encryption Scheme For Data Outsourcing Systems, Hanshu Hong, Ximeng Liu, Zhixin Sun Dec 2021

A Fine-Grained Attribute Based Data Retrieval With Proxy Re-Encryption Scheme For Data Outsourcing Systems, Hanshu Hong, Ximeng Liu, Zhixin Sun

Research Collection School Of Computing and Information Systems

Attribute based encryption is suitable for data protection in data outsourcing systems such as cloud computing. However, the leveraging of encryption technique may retrain some routine operations over the encrypted data, particularly in the field of data retrieval. This paper presents an attribute based date retrieval with proxy re-encryption (ABDR-PRE) to provide both fine-grained access control and retrieval over the ciphertexts. The proposed scheme achieves fine-grained data access management by adopting KP-ABE mechanism, a delegator can generate the re-encryption key and search indexes for the ciphertexts to be shared over the target delegatee’s attributes. Throughout the process of data sharing, …


On Analysing Student Resilience In Higher Education Programs Using A Data-Driven Approach, Audrey Tedja Widjaja, Ee Peng Lim, Aldy Gunawan Dec 2021

On Analysing Student Resilience In Higher Education Programs Using A Data-Driven Approach, Audrey Tedja Widjaja, Ee Peng Lim, Aldy Gunawan

Research Collection School Of Computing and Information Systems

Analysing student resilience is important as research has shown that resilience is related to students’ academic performance and their persistence through academic setbacks. While questionnaires can be conducted to assess student resilience directly, they suffer from human recall errors and deliberate suppression of true responses. In this paper, we propose ACREA, ACademic REsilience Analytics framework which adopts a datadriven approach to analyse student resilient behavior with the use of student-course data. ACREA defines academic setbacks experienced by students and measures how well students overcome such setbacks using a quasi-experimental design. By applying ACREA on a real world student-course dataset, we …


Microservices Orchestration Vs. Choreography: A Decision Framework, Alan @ Ali Madjelisi Megargel, Christopher M. Poskitt, Shankararaman, Venky Dec 2021

Microservices Orchestration Vs. Choreography: A Decision Framework, Alan @ Ali Madjelisi Megargel, Christopher M. Poskitt, Shankararaman, Venky

Research Collection School Of Computing and Information Systems

Microservices-based applications consist of loosely coupled, independently deployable services that encapsulate units of functionality. To implement larger application processes, these microservices must communicate and collaborate. Typically, this follows one of two patterns: (1) choreography, in which communication is done via asynchronous message-passing; or (2) orchestration, in which a controller is used to synchronously manage the process flow. Choosing the right pattern requires the resolution of some trade-offs concerning coupling, chattiness, visibility, and design. To address this problem, we propose a decision framework for microservices collaboration patterns that helps solution architects to crystallize their goals, compare the key factors, and then …


Robust Bipoly-Matching For Multi-Granular Entities, Ween Jiann Lee, Maksim Tkachenko, Hady W. Lauw Dec 2021

Robust Bipoly-Matching For Multi-Granular Entities, Ween Jiann Lee, Maksim Tkachenko, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `one' necessarily comes from a designated source and the `many' from the other source. In contrast, we propose a novel formulation that allows concurrent one-to-many bidirectional matching in any direction. Beyond flexibility, we also seek matching that is more robust to noisy similarity values arising from diverse entity descriptions, by introducing receptivity and reclusivity notions. In addition …


Learning Large Neighborhood Search Policy For Integer Programming, Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang Dec 2021

Learning Large Neighborhood Search Policy For Integer Programming, Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP). The RL policy is trained as the destroy operator to select a subset of variables at each step, which is reoptimized by an IP solver as the repair operator. However, the combinatorial number of variable subsets prevents direct application of typical RL algorithms. To tackle this challenge, we represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm. We …


Spurring Digital Transformation In Singapore's Legal Industry, Xin Juan Chua, Steven M. Miller Dec 2021

Spurring Digital Transformation In Singapore's Legal Industry, Xin Juan Chua, Steven M. Miller

Research Collection School Of Computing and Information Systems

COVID-19 has transformed the way we live and work. It has caused the processes and operations of businesses and organisations to be restructured, as well as transformed business models. A 2020 McKinsey Global survey reported that companies all over the world claim they have accelerated the digitalisation of their customer and supply-chain interactions, as well as their internal operations, by three to four years. They also said they thought the share of digital or digitally enabled products in their portfolios has advanced by seven years. While technology transformation is not new to the legal profession, COVID-19 has cemented the importance …


Towards Non-Intrusive Camera-Based Heart Rate Variability Estimation In The Car Under Naturalistic Condition, Shu Liu, Kevin Koch, Zimu Zhou, Martin Maritsch, Xiaoxi He, Elgar Fleisch, Felix Wortmann Dec 2021

Towards Non-Intrusive Camera-Based Heart Rate Variability Estimation In The Car Under Naturalistic Condition, Shu Liu, Kevin Koch, Zimu Zhou, Martin Maritsch, Xiaoxi He, Elgar Fleisch, Felix Wortmann

Research Collection School Of Computing and Information Systems

Driver status monitoring systems are a vital component of smart cars in the future, especially in the era when an increasing amount of time is spent in the vehicle. The heart rate (HR) is one of the most important physiological signals of driver status. To infer HR of drivers, the mainstream of existing research focused on capturing subtle heartbeat-induced vibration of the torso or leveraged photoplethysmography (PPG) that detects cardiac cycle-related blood volume changes in the microvascular. However, existing approaches rely on dedicated sensors that are expensive and cumbersome to be integrated or are vulnerable to ambient noise. Moreover, their …


Imon: Appearance-Based Gaze Tracking System On Mobile Devices, Sinh Huynh, Rajesh Krishna Balan, Jeonggil Ko Dec 2021

Imon: Appearance-Based Gaze Tracking System On Mobile Devices, Sinh Huynh, Rajesh Krishna Balan, Jeonggil Ko

Research Collection School Of Computing and Information Systems

Gaze tracking is a key building block used in many mobile applications including entertainment, personal productivity, accessibility, medical diagnosis, and visual attention monitoring. In this paper, we present iMon, an appearance-based gaze tracking system that is both designed for use on mobile phones and has significantly greater accuracy compared to prior state-of-the-art solutions. iMon achieves this by comprehensively considering the gaze estimation pipeline and then overcoming three different sources of errors. First, instead of assuming that the user's gaze is fixed to a single 2D coordinate, we construct each gaze label using a probabilistic 2D heatmap gaze representation input to …


Strategic Behavior And Market Inefficiency In Blockchain-Based Auctions, Ping Fan Ke, Jianqing Chen, Zhiling Guo Dec 2021

Strategic Behavior And Market Inefficiency In Blockchain-Based Auctions, Ping Fan Ke, Jianqing Chen, Zhiling Guo

Research Collection School Of Computing and Information Systems

Blockchain-based auctions play a key role in decentralized finance, such as liquidation of collaterals in crypto-lending. In this research, we show that a Blockchain-based auction is subject to the threat to availability because of the characteristics of the Blockchain platform, which could lead to auction inefficiency or even market failure. Specifically, an adversary could occupy all of the transaction capacity of an auction by sending transactions with sufficiently high transaction fees, and then win the item in an auction with a nearly zero bid price as there are no competitors available. We discuss how to prevent this kind of strategic …


Vireo @ Trecvid 2021 Ad-Hoc Video Search, Jiaxin Wu, Phuong Anh Nguyen, Chong-Wah Ngo Dec 2021

Vireo @ Trecvid 2021 Ad-Hoc Video Search, Jiaxin Wu, Phuong Anh Nguyen, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

In this paper, we summarize our submitted runs and results for Ad-hoc Video Search (AVS) task at TRECVid 2020


Self-Supervised Learning Disentangled Group Representation As Feature, Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang Dec 2021

Self-Supervised Learning Disentangled Group Representation As Feature, Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang

Research Collection School Of Computing and Information Systems

A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics). In this paper, we formulate the notion of “good” representation from a group-theoretic view using Higgins’ definition of disentangled representation [38], and show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization, thus unable to modularize the remaining semantics. To break the limitation, we propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM), which successfully grounds the abstract semantics and the group acting on them into concrete contrastive learning. …


Automated Doubt Identification From Informal Reflections Through Hybrid Sentic Patterns And Machine Learning Approach, Siaw Ling Lo, Kar Way Tan, Eng Lieh Ouh Dec 2021

Automated Doubt Identification From Informal Reflections Through Hybrid Sentic Patterns And Machine Learning Approach, Siaw Ling Lo, Kar Way Tan, Eng Lieh Ouh

Research Collection School Of Computing and Information Systems

Do my students understand? The question that lingers in every instructor’s mind after each lesson. With the focus on learner-centered pedagogy, is it feasible to provide timely and relevant guidance to individual learners according to their levels of understanding? One of the options available is to collect reflections from learners after each lesson to extract relevant feedback so that doubts or questions can be addressed in a timely manner. In this paper, we derived a hybrid approach that leverages a novel Doubt Sentic Pattern Detection (SPD) algorithm and a machine learning model to automate the identification of doubts from students’ …


Neurolkh: Combining Deep Learning Model With Lin-Kernighan-Helsgaun Heuristic For Solving The Traveling Salesman Problem, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang Dec 2021

Neurolkh: Combining Deep Learning Model With Lin-Kernighan-Helsgaun Heuristic For Solving The Traveling Salesman Problem, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

We present NeuroLKH, a novel algorithm that combines deep learning with the strong traditional heuristic Lin-Kernighan-Helsgaun (LKH) for solving Traveling Salesman Problem. Specifically, we train a Sparse Graph Network (SGN) with supervised learning for edge scores and unsupervised learning for node penalties, both of which are critical for improving the performance of LKH. Based on the output of SGN, NeuroLKH creates the edge candidate set and transforms edge distances to guide the searching process of LKH. Extensive experiments firmly demonstrate that, by training one model on a wide range of problem sizes, NeuroLKH significantly outperforms LKH and generalizes well to …


Learning To Iteratively Solve Routing Problems With Dual-Aspect Collaborative Transformer, Yining Ma, Jingwen Li, Zhiguang Cao, Wen Song, Le Zhang, Zhenghua Chen, Jing Tang Dec 2021

Learning To Iteratively Solve Routing Problems With Dual-Aspect Collaborative Transformer, Yining Ma, Jingwen Li, Zhiguang Cao, Wen Song, Le Zhang, Zhenghua Chen, Jing Tang

Research Collection School Of Computing and Information Systems

Recently, Transformer has become a prevailing deep architecture for solving vehicle routing problems (VRPs). However, it is less effective in learning improvement models for VRP because its positional encoding (PE) method is not suitable in representing VRP solutions. This paper presents a novel Dual-Aspect Collaborative Transformer (DACT) to learn embeddings for the node and positional features separately, instead of fusing them together as done in existing ones, so as to avoid potential noises and incompatible correlations. Moreover, the positional features are embedded through a novel cyclic positional encoding (CPE) method to allow Transformer to effectively capture the circularity and symmetry …


Canita: Faster Rates For Distributed Convex Optimization With Communication Compression, Zhize Li, Peter Richtarik Dec 2021

Canita: Faster Rates For Distributed Convex Optimization With Communication Compression, Zhize Li, Peter Richtarik

Research Collection School Of Computing and Information Systems

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 1983, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method based on ANITA (Li, 2021) for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} $O\bigg(\sqrt{\Big(1+\sqrt{\frac{\omega^3}{n}}\Big)\frac{L}{\epsilon}} + \omega\big(\frac{1}{\epsilon}\big)^{\frac{1}{3}}\bigg)$, …


Channel Integration Services In Online Healthcare Communities, Anqi Zhao, Qian Tang Dec 2021

Channel Integration Services In Online Healthcare Communities, Anqi Zhao, Qian Tang

Research Collection School Of Computing and Information Systems

In online healthcare communities, channel integration services have become the bridge between online and offline channels, enabling patients to easily migrate across channels. Different from pure online services, online-to-offline (On2Off) and offline-to-online (Off2On) channel integration services involve both channels. This study examines the interrelationships between pure online services and channel integration services. Using a panel dataset composed of data from an online healthcare community, we find that pure online services decrease patients’ demand for On2Off integration services but increase their use of Off2On integration services. Our findings suggest that providing healthcare services online can reduce online patients’ needs to visit …


On Analysing Student Resilience In Higher Education Programs Using A Data-Driven Approach, Audrey Tedja Widjaja, Ee-Peng Lim, Aldy Gunawan Dec 2021

On Analysing Student Resilience In Higher Education Programs Using A Data-Driven Approach, Audrey Tedja Widjaja, Ee-Peng Lim, Aldy Gunawan

Research Collection School Of Computing and Information Systems

Analysing student resilience is important as research has shown that resilience is related to students’ academic performance and their persistence through academic setbacks. While questionnaires can be conducted to assess student resilience directly, they suffer from human recall errors and deliberate suppression of true responses. In this paper, we propose ACREA, ACademic REsilience Analytics framework which adopts a data-driven approach to analyse student resilient behavior with the use of student-course data. ACREA defines academic setbacks experienced by students and measures how well students overcome such setbacks using a quasi-experimental design. By applying ACREA on a real world student-course dataset, we …


Rmm: Reinforced Memory Management For Class-Incremental Learning, Yaoyao Liu, Qianru Sun, Qianru Sun Dec 2021

Rmm: Reinforced Memory Management For Class-Incremental Learning, Yaoyao Liu, Qianru Sun, Qianru Sun

Research Collection School Of Computing and Information Systems

Class-Incremental Learning (CIL) [38] trains classifiers under a strict memory budget: in each incremental phase, learning is done for new data, most of which is abandoned to free space for the next phase. The preserved data are exemplars used for replaying. However, existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal. In this work, we propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes. We call our method reinforced memory management (RMM), leveraging reinforcement learning. RMM training is not naturally compatible with CIL as the …


Empirical Evaluation Of Minority Oversampling Techniques In The Context Of Android Malware Detection, Lwin Khin Shar, Nguyen Binh Duong Ta, David Lo Dec 2021

Empirical Evaluation Of Minority Oversampling Techniques In The Context Of Android Malware Detection, Lwin Khin Shar, Nguyen Binh Duong Ta, David Lo

Research Collection School Of Computing and Information Systems

In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE …


Context-Aware Graph Convolutional Network For Dynamic Origin-Destination Prediction, Juan Nathaniel, Baihua Zheng Dec 2021

Context-Aware Graph Convolutional Network For Dynamic Origin-Destination Prediction, Juan Nathaniel, Baihua Zheng

Research Collection School Of Computing and Information Systems

A robust Origin-Destination (OD) prediction is key to urban mobility. A good forecasting model can reduce operational risks and improve service availability, among many other upsides. Here, we examine the use of Graph Convolutional Net-work (GCN) and its hybrid Markov-Chain (GCN-MC) variant to perform a context-aware OD prediction based on a large-scale public transportation dataset in Singapore. Compared with the baseline Markov-Chain algorithm and GCN, the proposed hybrid GCN-MC model improves the prediction accuracy by 37% and 12% respectively. Lastly, the addition of temporal and historical contextual information further improves the performance of the proposed hybrid model by 4 –12%.


Fine-Grained Generalization Analysis Of Inductive Matrix Completion, Antoine Ledent, Rodrigo Alves, Yunwen Lei, Marius Kloft Dec 2021

Fine-Grained Generalization Analysis Of Inductive Matrix Completion, Antoine Ledent, Rodrigo Alves, Yunwen Lei, Marius Kloft

Research Collection School Of Computing and Information Systems

In this paper, we bridge the gap between the state-of-the-art theoretical results for matrix completion with the nuclear norm and their equivalent in \textit{inductive matrix completion}: (1) In the distribution-free setting, we prove bounds improving the previously best scaling of \widetilde{O}(rd2) to \widetilde{O}(d3/2√r), where d is the dimension of the side information and rr is the rank. (2) We introduce the (smoothed) \textit{adjusted trace-norm minimization} strategy, an inductive analogue of the weighted trace norm, for which we show guarantees of the order \widetilde{O}(dr) under arbitrary sampling. In the inductive case, a similar rate was previously achieved only under uniform sampling …


A Bert-Based Two-Stage Model For Chinese Chengyu Recommendation, Minghuan Tan, Jing Jiang, Bingtian Dai Nov 2021

A Bert-Based Two-Stage Model For Chinese Chengyu Recommendation, Minghuan Tan, Jing Jiang, Bingtian Dai

Research Collection School Of Computing and Information Systems

In Chinese, Chengyu are fixed phrases consisting of four characters. As a type of idioms, their meanings usually cannot be derived from their component characters. In this paper, we study the task of recommending a Chengyu given a textual context. Observing some of the limitations with existing work, we propose a two-stage model, where during the first stage we re-train a Chinese BERT model by masking out Chengyu from a large Chinese corpus with a wide coverage of Chengyu. During the second stage, we fine-tune the retrained, Chengyu-oriented BERT on a specific Chengyu recommendation dataset. We evaluate this method on …


Investigating The Effects Of Dimension-Specific Sentiments On Product Sales: The Perspective Of Sentiment Preferences, Cuiqing Jiang, Jianfei Wang, Qian Tang, Xiaozhong Lyu Nov 2021

Investigating The Effects Of Dimension-Specific Sentiments On Product Sales: The Perspective Of Sentiment Preferences, Cuiqing Jiang, Jianfei Wang, Qian Tang, Xiaozhong Lyu

Research Collection School Of Computing and Information Systems

While literature has reached a consensus on the awareness effect of online word-of-mouth (eWOM), this paper studies its persuasive effect, specifically, the dimension-specific sentiment effects on product sales. We allow the sentiment information in eWOM along different product dimensions to have different persuasive effects on consumers’ purchase decisions. This occurs because of consumers’ sentiment preference, which is defined as the relative importance consumers place on various dimension-specific sentiments. We use an aspect-level sentiment analysis to derive the dimension-specific sentiments and PVAR (panel vector auto-regression) models to estimate their effects on product sales using a movie panel dataset. The findings show …


Learning To Teach And Learn For Semi-Supervised Few-Shot Image Classification, Xinzhe Li, Jianqiang Huang, Yaoyao Liu, Qin Zhou, Shibao Zheng, Bernt Schiele, Qianru Sun Nov 2021

Learning To Teach And Learn For Semi-Supervised Few-Shot Image Classification, Xinzhe Li, Jianqiang Huang, Yaoyao Liu, Qin Zhou, Shibao Zheng, Bernt Schiele, Qianru Sun

Research Collection School Of Computing and Information Systems

This paper presents a novel semi-supervised few-shot image classification method named Learning to Teach and Learn (LTTL) to effectively leverage unlabeled samples in small-data regimes. Our method is based on self-training, which assigns pseudo labels to unlabeled data. However, the conventional pseudo-labeling operation heavily relies on the initial model trained by using a handful of labeled data and may produce many noisy labeled samples. We propose to solve the problem with three steps: firstly, cherry-picking searches valuable samples from pseudo-labeled data by using a soft weighting network; and then, cross-teaching allows the classifiers to teach mutually for rejecting more noisy …


Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability, Xin Lv, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Yichi Zhang, Zelin Dai Nov 2021

Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability, Xin Lv, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Yichi Zhang, Zelin Dai

Research Collection School Of Computing and Information Systems

Multi-hop reasoning has been widely studied in recent years to obtain more interpretable link prediction. However, we find in experiments that many paths given by these models are actually unreasonable, while little work has been done on interpretability evaluation for them. In this paper, we propose a unified framework to quantitatively evaluate the interpretability of multi-hop reasoning models so as to advance their development. In specific, we define three metrics, including path recall, local interpretability, and global interpretability for evaluation, and design an approximate strategy to calculate these metrics using the interpretability scores of rules. We manually annotate all possible …


Representation Learning On Multi-Layered Heterogeneous Network, Delvin Ce Zhang, Hady W. Lauw Nov 2021

Representation Learning On Multi-Layered Heterogeneous Network, Delvin Ce Zhang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Network data can often be represented in a multi-layered structure with rich semantics. One example is e-commerce data, containing user-user social network layer and item-item context layer, with cross-layer user-item interactions. Given the dual characters of homogeneity within each layer and heterogeneity across layers, we seek to learn node representations from such a multi-layered heterogeneous network while jointly preserving structural information and network semantics. In contrast, previous works on network embedding mainly focus on single-layered or homogeneous networks with one type of nodes and links. In this paper we propose intra- and cross-layer proximity concepts. Intra-layer proximity simulates propagation along …


Flip & Slack – Active Flipped Classroom Learning With Collaborative Slack Interactions, Kyong Jin Shim, Gottipati Swapna, Yi Meng Lau Nov 2021

Flip & Slack – Active Flipped Classroom Learning With Collaborative Slack Interactions, Kyong Jin Shim, Gottipati Swapna, Yi Meng Lau

Research Collection School Of Computing and Information Systems

Active flipped classroom learning is stipulated with faculty structuring the activities involving constructive interactions, either formal or informal. Sharing ideas and responding to ideas improve the cognitive skills of the students. Encouraging peers to contribute to class activities and respecting peers contribute to the development of affective skills. We present an integrated platform for cognitive and affective skills development. A flipped classroom arrangement allows the faculty to focus more on in-class activities such as programming and lab exercises to support active learning in computing courses. We share the design of an innovative flipped classroom model integrated with Slack and present …


Automating Developer Chat Mining, Shengyi Pan, Lingfeng Bao, Xiaoxue Ren, Xin Xia, David Lo, Shanping Li Nov 2021

Automating Developer Chat Mining, Shengyi Pan, Lingfeng Bao, Xiaoxue Ren, Xin Xia, David Lo, Shanping Li

Research Collection School Of Computing and Information Systems

Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by …


Contrastive Pre-Training Of Gnns On Heterogeneous Graphs, Xunqiang Jiang, Yuanfu Lu, Yuan Fang, Chuan Shi Nov 2021

Contrastive Pre-Training Of Gnns On Heterogeneous Graphs, Xunqiang Jiang, Yuanfu Lu, Yuan Fang, Chuan Shi

Research Collection School Of Computing and Information Systems

While graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs, they often require a large amount of labeled data to achieve satisfactory performance, which is often expensive or unavailable. To relieve the label scarcity issue, some pre-training strategies have been devised for GNNs, to learn transferable knowledge from the universal structural properties of the graph. However, existing pre-training strategies are only designed for homogeneous graphs, in which each node and edge belongs to the same type. In contrast, a heterogeneous graph embodies rich semantics, as multiple types of nodes interact with each other via different kinds …


Topic Modeling For Multi-Aspect Listwise Comparison, Delvin Ce Zhang, Hady W. Lauw Nov 2021

Topic Modeling For Multi-Aspect Listwise Comparison, Delvin Ce Zhang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

As a well-established probabilistic method, topic models seek to uncover latent semantics from plain text. In addition to having textual content, we observe that documents are usually compared in listwise rankings based on their content. For instance, world-wide countries are compared in an international ranking in terms of electricity production based on their national reports. Such document comparisons constitute additional information that reveal documents' relative similarities. Incorporating them into topic modeling could yield comparative topics that help to differentiate and rank documents. Furthermore, based on different comparison criteria, the observed document comparisons usually cover multiple aspects, each expressing a distinct …