Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Engineering

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu Dec 2022

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu

Research Collection School Of Computing and Information Systems

Recommender systems learn from historical user-item interactions to identify preferred items for target users. These observed interactions are usually unbalanced following a long-tailed distribution. Such long-tailed data lead to popularity bias to recommend popular but not personalized items to users. We present a gradient perspective to understand two negative impacts of popularity bias in recommendation model optimization: (i) the gradient direction of popular item embeddings is closer to that of positive interactions, and (ii) the magnitude of positive gradient for popular items are much greater than that of unpopular items. To address these issues, we propose a simple yet efficient …


Fastklee: Faster Symbolic Execution Via Reducing Redundant Bound Checking Of Type-Safe Pointers, Haoxin Tu, Lingxiao Jiang, Xuhua Ding, He Jiang Nov 2022

Fastklee: Faster Symbolic Execution Via Reducing Redundant Bound Checking Of Type-Safe Pointers, Haoxin Tu, Lingxiao Jiang, Xuhua Ding, He Jiang

Research Collection School Of Computing and Information Systems

Symbolic execution (SE) has been widely adopted for automatic program analysis and software testing. Many SE engines (e.g., KLEE or Angr) need to interpret certain Intermediate Representations (IR) of code during execution, which may be slow and costly. Although a plurality of studies proposed to accelerate SE, few of them consider optimizing the internal interpretation operations. In this paper, we propose FastKLEE, a faster SE engine that aims to speed up execution via reducing redundant bound checking of type-safe pointers during IR code interpretation. Specifically, in FastKLEE, a type inference system is first leveraged to classify pointer types (i.e., safe …


A Quality Metric For K-Means Clustering Based On Centroid Locations, Manoj Thulasidas Nov 2022

A Quality Metric For K-Means Clustering Based On Centroid Locations, Manoj Thulasidas

Research Collection School Of Computing and Information Systems

K-Means clustering algorithm does not offer a clear methodology to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. In this paper, we present a new metric for clustering quality and describe its use for K selection. The proposed metric, based on the locations of the centroids, as well as the desired properties of the clusters, is developed in two stages. In the initial stage, we take into account the full covariance matrix of the clustering variables, thereby making it mathematically similar to a reduced chi2. We then extend it to account for …


Remgen: Remanufacturing A Random Program Generator For Compiler Testing, Haoxin Tu, He Jiang, Xiaochen Li, Zhide Zhou, Lingxiao Jiang, Lingxiao Jiang Oct 2022

Remgen: Remanufacturing A Random Program Generator For Compiler Testing, Haoxin Tu, He Jiang, Xiaochen Li, Zhide Zhou, Lingxiao Jiang, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

Program generators play a critical role in generating bug-revealing test programs for compiler testing. However, existing program generators have been tamed nowadays (i.e., compilers have been hardened against test programs generated by them), thus calling for new solutions to improve their capability in generating bug-revealing test programs. In this study, we propose a framework named Remgen, aiming to Remanufacture a random program Generator for this purpose. RemgEnaddresses the challenges of the synthesis of diverse code snippets at a low cost and the selection of the bug-revealing code snippets for constructing new test programs. More specifically, RemgEnfirst designs a grammar-aided synthesis …


Softskip: Empowering Multi-Modal Dynamic Pruning For Single-Stage Referring Comprehension, Dulanga Weerakoon, Vigneshwaran Subbaraju, Tuan Tran, Archan Misra Oct 2022

Softskip: Empowering Multi-Modal Dynamic Pruning For Single-Stage Referring Comprehension, Dulanga Weerakoon, Vigneshwaran Subbaraju, Tuan Tran, Archan Misra

Research Collection School Of Computing and Information Systems

Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual …


Reflecting On Experiences For Response Generation, Chenchen Ye, Lizi Liao, Suyu Liu, Tat-Seng Chua Oct 2022

Reflecting On Experiences For Response Generation, Chenchen Ye, Lizi Liao, Suyu Liu, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Multimodal dialogue systems attract much attention recently, but they are far from skills like: 1) automatically generate context- specific responses instead of safe but general responses; 2) naturally coordinate between the different information modalities (e.g. text and image) in responses; 3) intuitively explain the reasons for generated responses and improve a specific response without re-training the whole model. To approach these goals, we propose a different angle for the task - Reflecting Experiences for Response Generation (RERG). This is supported by the fact that generating a response from scratch can be hard, but much easier if we can access other …


An Attribute-Aware Attentive Gcn Model For Attribute Missing In Recommendation, Fan Liu, Zhiyong Cheng, Lei Zhu, Chenghao Liu, Liqiang Nie Sep 2022

An Attribute-Aware Attentive Gcn Model For Attribute Missing In Recommendation, Fan Liu, Zhiyong Cheng, Lei Zhu, Chenghao Liu, Liqiang Nie

Research Collection School Of Computing and Information Systems

As important side information, attributes have been widely exploited in the existing recommender system for better performance. However, in the real-world scenarios, it is common that some attributes of items/users are missing (e.g., some movies miss the genre data). Prior studies usually use a default value (i.e., "other") to represent the missing attribute, resulting in sub-optimal performance. To address this problem, in this paper, we present an attribute-aware attentive graph convolution network (A(2)-GCN). In particular, we first construct a graph, where users, items, and attributes are three types of nodes and their associations are edges. Thereafter, we leverage the graph …


Taxi Travel Time Based Geographically Weighted Regression Model (Gwr) For Modeling Public Housing Prices In Singapore, Yi’An Wang, Fangyi Cai, Shih-Fen Cheng, Bo Wu, Kai Cao Jun 2022

Taxi Travel Time Based Geographically Weighted Regression Model (Gwr) For Modeling Public Housing Prices In Singapore, Yi’An Wang, Fangyi Cai, Shih-Fen Cheng, Bo Wu, Kai Cao

Research Collection School Of Computing and Information Systems

In this research, a taxi travel time based Geographically Weighted Regression model (GWR) is proposed and utilized to model the public housing price in the case study of Singapore. In addition, a comparison between the proposed taxi data driven GWR and other models, such as ordinary least squares model (OLS), GWR model based on Euclidean distance and GWR model based on public transport travel time, have also been carried out. Results indicates that taxi travel time based GWR model has better fitting performance than the OLS model, and slightly better than the Euclidean distance-based GWR model, however, it is not …


Who Is Missing? Characterizing The Participation Of Different Demographic Groups In A Korean Nationwide Daily Conversation Corpus, Haewoon Kwak, Jisun An, Kunwoo Park Jun 2022

Who Is Missing? Characterizing The Participation Of Different Demographic Groups In A Korean Nationwide Daily Conversation Corpus, Haewoon Kwak, Jisun An, Kunwoo Park

Research Collection School Of Computing and Information Systems

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.


Shellfusion: Answer Generation For Shell Programming Tasks Via Knowledge Fusion, Neng Zhang, Chao Liu, Xin Xia, Christoph Treude, Ying Zou, David Lo, Zibin Zheng May 2022

Shellfusion: Answer Generation For Shell Programming Tasks Via Knowledge Fusion, Neng Zhang, Chao Liu, Xin Xia, Christoph Treude, Ying Zou, David Lo, Zibin Zheng

Research Collection School Of Computing and Information Systems

Shell commands are widely used for accomplishing tasks, such as network management and file manipulation, in Unix and Linux platforms. There are a large number of shell commands available. For example, 50,000+ commands are documented in the Ubuntu Manual Pages (MPs). Quite often, programmers feel frustrated when searching and orchestrating appropriate shell commands to accomplish specific tasks. To address the challenge, the shell programming community calls for easy-to-use tutorials for shell commands. However, existing tutorials (e.g., TLDR) only cover a limited number of frequently used commands for shell beginners and provide limited support for users to search for commands by …


Message-Locked Searchable Encryption: A New Versatile Tool For Secure Cloud Storage, Xueqiao Liu, Guomin Yang, Willy Susilo, Joseph Tonien, Rongmao Chen, Xixiang Lv May 2022

Message-Locked Searchable Encryption: A New Versatile Tool For Secure Cloud Storage, Xueqiao Liu, Guomin Yang, Willy Susilo, Joseph Tonien, Rongmao Chen, Xixiang Lv

Research Collection School Of Computing and Information Systems

Message-Locked Encryption (MLE) is a useful tool to enable deduplication over encrypted data in cloud storage. It can significantly improve the cloud service quality by eliminating redundancy to save storage resources, and hence user cost, and also providing defense against different types of attacks, such as duplicate faking attack and brute-force attack. A typical MLE scheme only focuses on deduplication. On the other hand, supporting search operations on stored content is another essential requirement for cloud storage. In this article, we present a message-locked searchable encryption (MLSE) scheme in a dual-server setting, which achieves simultaneously the desirable features of supporting …


Benchmarking Library Recognition In Tweets, Ting Zhang, Divya Prabha Chandrasekaran, Ferdian Thung, David Lo May 2022

Benchmarking Library Recognition In Tweets, Ting Zhang, Divya Prabha Chandrasekaran, Ferdian Thung, David Lo

Research Collection School Of Computing and Information Systems

Software developers often use social media (such as Twitter) to shareprogramming knowledge such as new tools, sample code snippets,and tips on programming. One of the topics they talk about is thesoftware library. The tweets may contain useful information abouta library. A good understanding of this information, e.g., on thedeveloper’s views regarding a library can be beneficial to weigh thepros and cons of using the library as well as the general sentimentstowards the library. However, it is not trivial to recognize whethera word actually refers to a library or other meanings. For example,a tweet mentioning the word “pandas" may refer to …


An Empirical Study Of Memorization In Nlp, Xiaosen Zheng, Jing Jiang May 2022

An Empirical Study Of Memorization In Nlp, Xiaosen Zheng, Jing Jiang

Research Collection School Of Computing and Information Systems

A recent study by Feldman (2020) proposed a long-tail theory to explain the memorization behavior of deep learning models. However, memorization has not been empirically verified in the context of NLP, a gap addressed by this work. In this paper, we use three different NLP tasks to check if the long-tail theory holds. Our experiments demonstrate that top-ranked memorized training instances are likely atypical, and removing the top-memorized training instances leads to a more serious drop in test accuracy compared with removing training instances randomly. Furthermore, we develop an attribution method to better understand why a training instance is memorized. …


Verifiable Searchable Encryption Framework Against Insider Keyword-Guessing Attack In Cloud Storage, Yinbin Miao, Robert H. Deng, Kim-Kwang Raymond Choo, Ximeng Liu, Hongwei Li Apr 2022

Verifiable Searchable Encryption Framework Against Insider Keyword-Guessing Attack In Cloud Storage, Yinbin Miao, Robert H. Deng, Kim-Kwang Raymond Choo, Ximeng Liu, Hongwei Li

Research Collection School Of Computing and Information Systems

Searchable encryption (SE) allows cloud tenants to retrieve encrypted data while preserving data confidentiality securely. Many SE solutions have been designed to improve efficiency and security, but most of them are still susceptible to insider Keyword-Guessing Attacks (KGA), which implies that the internal attackers can guess the candidate keywords successfully in an off-line manner. Also in existing SE solutions, a semi-honest-but-curious cloud server may deliver incorrect search results by performing only a fraction of retrieval operations honestly (e.g., to save storage space). To address these two challenging issues, we first construct the basic Verifiable SE Framework (VSEF), which can withstand …


Improving Feature Generalizability With Multitask Learning In Class Incremental Learning, Dong Ma, Chi Ian Tang, Cecilia Mascolo Apr 2022

Improving Feature Generalizability With Multitask Learning In Class Incremental Learning, Dong Ma, Chi Ian Tang, Cecilia Mascolo

Research Collection School Of Computing and Information Systems

Many deep learning applications, like keyword spotting [1], [2], require the incorporation of new concepts (classes) over time, referred to as Class Incremental Learning (CIL). The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. However, prior works primarily focus on the incremental learning step, while ignoring the optimization during the base model training. We hypothesise that a more transferable and generalizable feature representation from the base model would …


Pre-Training Graph Neural Networks For Link Prediction In Biomedical Networks, Yahui Long, Min Wu, Yong Liu, Yuan Fang, Chee Kong Kwoh, Jiawei Luo, Xiaoli Li Apr 2022

Pre-Training Graph Neural Networks For Link Prediction In Biomedical Networks, Yahui Long, Min Wu, Yong Liu, Yuan Fang, Chee Kong Kwoh, Jiawei Luo, Xiaoli Li

Research Collection School Of Computing and Information Systems

Motivation: Graphs or networks are widely utilized to model the interactions between different entities (e.g., proteins, drugs, etc) for biomedical applications. Predicting potential links in biomedical networks is important for understanding the pathological mechanisms of various complex human diseases, as well as screening compound targets for drug discovery. Graph neural networks (GNNs) have been designed for link prediction in various biomedical networks, which rely on the node features extracted from different data sources, e.g., sequence, structure and network data. However, it is challenging to effectively integrate these data sources and automatically extract features for different link prediction tasks. Results: In …


Analyzing Offline Social Engagements: An Empirical Study Of Meetup Events Related To Software Development, Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo Mar 2022

Analyzing Offline Social Engagements: An Empirical Study Of Meetup Events Related To Software Development, Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo

Research Collection School Of Computing and Information Systems

Software developers use a variety of social mediachannels and tools in order to keep themselves up to date,collaborate with other developers, and find projects to contributeto. Meetup is one of such social media used by softwaredevelopers to organize community gatherings. We in this work,investigate the dynamics of Meetup groups and events relatedto software development. Our work is different from previouswork as we focus on the actual event and group data that wascollected using Meetup API.In this work, we performed an empirical study of eventsand groups present on Meetup which are related to softwaredevelopment. First, we identified 6,327 Meetup groups related …


Batchlens: A Visualization Approach For Analyzing Batch Jobs In Cloud Systems, Shaolun Ruan, Yong Wang, Hailong Jiang, Weijia Xu, Qiang. Guan Mar 2022

Batchlens: A Visualization Approach For Analyzing Batch Jobs In Cloud Systems, Shaolun Ruan, Yong Wang, Hailong Jiang, Weijia Xu, Qiang. Guan

Research Collection School Of Computing and Information Systems

Cloud systems are becoming increasingly powerful and complex. It is highly challenging to identify anomalous execution behaviors and pinpoint problems by examining the overwhelming intermediate results/states in complex application workflows. Domain scientists urgently need a friendly and functional interface to understand the quality of the computing services and the performance of their applications in real time. To meet these needs, we explore data generated by job schedulers and investigate general performance metrics (e.g., utilization of CPU, memory and disk I/O). Specifically, we propose an interactive visual analytics approach, BatchLens, to provide both providers and users of cloud service with an …


Efficient Certificateless Multi-Copy Integrity Auditing Scheme Supporting Data Dynamics, Lei Zhou, Anmin Fu, Guomin Yang, Huaqun Wang, Yuqing Zhang Mar 2022

Efficient Certificateless Multi-Copy Integrity Auditing Scheme Supporting Data Dynamics, Lei Zhou, Anmin Fu, Guomin Yang, Huaqun Wang, Yuqing Zhang

Research Collection School Of Computing and Information Systems

To improve data availability and durability, cloud users would like to store multiple copies of their original files at servers. The multi-copy auditing technique is proposed to provide users with the assurance that multiple copies are actually stored in the cloud. However, most multi-replica solutions rely on Public Key Infrastructure (PKI), which entails massive overhead of certificate computation and management. In this article, we propose an efficient multi-copy dynamic integrity auditing scheme by employing certificateless signatures (named MDSS), which gets rid of expensive certificate management overhead and avoids the key escrow problem in identity-based signatures. Specifically, we improve the classic …