Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 147

Full-Text Articles in Databases and Information Systems

Data Provenance Via Differential Auditing, Xin Mu, Ming Pang, Feida Zhu Nov 2023

Data Provenance Via Differential Auditing, Xin Mu, Ming Pang, Feida Zhu

Research Collection School Of Computing and Information Systems

With the rising awareness of data assets, data governance, which is to understand where data comes from, how it is collected, and how it is used, has been assuming evergrowing importance. One critical component of data governance gaining increasing attention is auditing machine learning models to determine if specific data has been used for training. Existing auditing techniques, like shadow auditing methods, have shown feasibility under specific conditions such as having access to label information and knowledge of training protocols. However, these conditions are often not met in most real-world applications. In this paper, we introduce a practical framework for …


When Routing Meets Recommendation: Solving Dynamic Order Recommendations Problem In Peer-To-Peer Logistics Platforms, Zhiqin Zhang, Waldy Joe, Yuyang Er, Hoong Chuin Lau Sep 2023

When Routing Meets Recommendation: Solving Dynamic Order Recommendations Problem In Peer-To-Peer Logistics Platforms, Zhiqin Zhang, Waldy Joe, Yuyang Er, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

Peer-to-Peer (P2P) logistics platforms, unlike traditional last-mile logistics providers, do not have dedicated delivery resources (both vehicles and drivers). Thus, the efficiency of such operating model lies in the successful matching of demand and supply, i.e., how to match the delivery tasks with suitable drivers that will result in successful assignment and completion of the tasks. We consider a Same-Day Delivery Problem (SDDP) involving a P2P logistics platform where new orders arrive dynamically and the platform operator needs to generate a list of recommended orders to the crowdsourced drivers. We formulate this problem as a Dynamic Order Recommendations Problem (DORP). …


Niche: A Curated Dataset Of Engineered Machine Learning Projects In Python, Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, David Lo, David Lo May 2023

Niche: A Curated Dataset Of Engineered Machine Learning Projects In Python, Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, David Lo, David Lo

Research Collection School Of Computing and Information Systems

Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on the evidence of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. This …


Nftdisk: Visual Detection Of Wash Trading In Nft Markets, Xiaolin Wen, Yong Wang, Xuanwu Yue, Feida Zhu, Min Zhu Apr 2023

Nftdisk: Visual Detection Of Wash Trading In Nft Markets, Xiaolin Wen, Yong Wang, Xuanwu Yue, Feida Zhu, Min Zhu

Research Collection School Of Computing and Information Systems

With the growing popularity of Non-Fungible Tokens (NFT), a new type of digital assets, various fraudulent activities have appeared in NFT markets. Among them, wash trading has become one of the most common frauds in NFT markets, which attempts to mislead investors by creating fake trading volumes. Due to the sophisticated patterns of wash trading, only a subset of them can be detected by automatic algorithms, and manual inspection is usually required. We propose NFTDisk, a novel visualization for investors to identify wash trading activities in NFT markets, where two linked visualization modules are presented: a radial visualization module with …


Learning Relation Prototype From Unlabeled Texts For Long-Tail Relation Extraction, Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua Feb 2023

Learning Relation Prototype From Unlabeled Texts For Long-Tail Relation Extraction, Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Relation Extraction (RE) is a vital step to complete Knowledge Graph (KG) by extracting entity relations from texts. However, it usually suffers from the long-tail issue. The training data mainly concentrates on a few types of relations, leading to the lack of sufficient annotations for the remaining types of relations. In this paper, we propose a general approach to learn relation prototypes from unlabeled texts, to facilitate the long-tail relation extraction by transferring knowledge from the relation types with sufficient training data. We learn relation prototypes as an implicit factor between entities, which reflects the meanings of relations as well …


Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu Dec 2022

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu

Research Collection School Of Computing and Information Systems

Recommender systems learn from historical user-item interactions to identify preferred items for target users. These observed interactions are usually unbalanced following a long-tailed distribution. Such long-tailed data lead to popularity bias to recommend popular but not personalized items to users. We present a gradient perspective to understand two negative impacts of popularity bias in recommendation model optimization: (i) the gradient direction of popular item embeddings is closer to that of positive interactions, and (ii) the magnitude of positive gradient for popular items are much greater than that of unpopular items. To address these issues, we propose a simple yet efficient …


An Attribute-Aware Attentive Gcn Model For Attribute Missing In Recommendation, Fan Liu, Zhiyong Cheng, Lei Zhu, Chenghao Liu, Liqiang Nie Sep 2022

An Attribute-Aware Attentive Gcn Model For Attribute Missing In Recommendation, Fan Liu, Zhiyong Cheng, Lei Zhu, Chenghao Liu, Liqiang Nie

Research Collection School Of Computing and Information Systems

As important side information, attributes have been widely exploited in the existing recommender system for better performance. However, in the real-world scenarios, it is common that some attributes of items/users are missing (e.g., some movies miss the genre data). Prior studies usually use a default value (i.e., "other") to represent the missing attribute, resulting in sub-optimal performance. To address this problem, in this paper, we present an attribute-aware attentive graph convolution network (A(2)-GCN). In particular, we first construct a graph, where users, items, and attributes are three types of nodes and their associations are edges. Thereafter, we leverage the graph …


Analyzing Offline Social Engagements: An Empirical Study Of Meetup Events Related To Software Development, Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo Mar 2022

Analyzing Offline Social Engagements: An Empirical Study Of Meetup Events Related To Software Development, Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo

Research Collection School Of Computing and Information Systems

Software developers use a variety of social mediachannels and tools in order to keep themselves up to date,collaborate with other developers, and find projects to contributeto. Meetup is one of such social media used by softwaredevelopers to organize community gatherings. We in this work,investigate the dynamics of Meetup groups and events relatedto software development. Our work is different from previouswork as we focus on the actual event and group data that wascollected using Meetup API.In this work, we performed an empirical study of eventsand groups present on Meetup which are related to softwaredevelopment. First, we identified 6,327 Meetup groups related …


Context-Aware Outstanding Fact Mining From Knowledge Graphs, Yueji Yang, Yuchen Li, Panagiotis Karras, Anthony Tung Aug 2021

Context-Aware Outstanding Fact Mining From Knowledge Graphs, Yueji Yang, Yuchen Li, Panagiotis Karras, Anthony Tung

Research Collection School Of Computing and Information Systems

An Outstanding Fact (OF) is an attribute that makes a target entity stand out from its peers. The mining of OFs has important applications, especially in Computational Journalism, such as news promotion, fact-checking, and news story finding. However, existing approaches to OF mining: (i) disregard the context in which the target entity appears, hence may report facts irrelevant to that context; and (ii) require relational data, which are often unavailable or incomplete in many application domains. In this paper, we introduce the novel problem of mining Contextaware Outstanding Facts (COFs) for a target entity under a given context specified by …


Thunderrw: An In-Memory Graph Random Walk Engine, Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, Yuchen Li Aug 2021

Thunderrw: An In-Memory Graph Random Walk Engine, Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, Yuchen Li

Research Collection School Of Computing and Information Systems

As random walk is a powerful tool in many graph processing, mining and learning applications, this paper proposes an efficient inmemory random walk engine named ThunderRW. Compared with existing parallel systems on improving the performance of a single graph operation, ThunderRW supports massive parallel random walks. The core design of ThunderRW is motivated by our profiling results: common RW algorithms have as high as 73.1% CPU pipeline slots stalled due to irregular memory access, which suffers significantly more memory stalls than the conventional graph workloads such as BFS and SSSP. To improve the memory efficiency, we first design a generic …


Hierarchical Reinforcement Learning: A Comprehensive Survey, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek Jun 2021

Hierarchical Reinforcement Learning: A Comprehensive Survey, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek

Research Collection School Of Computing and Information Systems

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future …


Minimum Coresets For Maxima Representation Of Multidimensional Data, Yanhao Wang, Michael Mathioudakis, Yuchen Li, Kian-Lee Tan Jun 2021

Minimum Coresets For Maxima Representation Of Multidimensional Data, Yanhao Wang, Michael Mathioudakis, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set �� of points in R �� , where �� is a small constant, and an error parameter �� ∈ (0, 1), a subset �� ⊆ �� …


On M-Impact Regions And Standing Top-K Influence Problems, Bo Tang, Kyriakos Mouratidis, Mingji Han Jun 2021

On M-Impact Regions And Standing Top-K Influence Problems, Bo Tang, Kyriakos Mouratidis, Mingji Han

Research Collection School Of Computing and Information Systems

In this paper, we study the ��-impact region problem (mIR). In a context where users look for available products with top-�� queries, mIR identifies the part of the product space that attracts the most user attention. Specifically, mIR determines the kind of attribute values that lead a (new or existing) product to the top-�� result for at least a fraction of the user population. mIR has several applications, ranging from effective marketing to product improvement. Importantly, it also leads to (exact and efficient) solutions for standing top-�� impact problems, which were previously solved heuristically only, or whose current solutions face …


Towards Efficient Motif-Based Graph Partitioning: An Adaptive Sampling Approach, Shixun Huang, Yuchen Li, Zhifeng Bao, Zhao Li Apr 2021

Towards Efficient Motif-Based Graph Partitioning: An Adaptive Sampling Approach, Shixun Huang, Yuchen Li, Zhifeng Bao, Zhao Li

Research Collection School Of Computing and Information Systems

In this paper, we study the problem of efficient motif-based graph partitioning (MGP). We observe that existing methods require to enumerate all motif instances to compute the exact edge weights for partitioning. However, the enumeration is prohibitively expensive against large graphs. We thus propose a sampling-based MGP (SMGP) framework that employs an unbiased sampling mechanism to efficiently estimate the edge weights while trying to preserve the partitioning quality. To further improve the effectiveness, we propose a novel adaptive sampling framework called SMGP+. SMGP+ iteratively partitions the input graph based on up-to-date estimated edge weights, and adaptively adjusts the sampling distribution …


Boundary Precedence Image Inpainting Method Based On Self-Organizing Maps, Haibo Pen, Quan Wang, Zhaoxia Wang Apr 2021

Boundary Precedence Image Inpainting Method Based On Self-Organizing Maps, Haibo Pen, Quan Wang, Zhaoxia Wang

Research Collection School Of Computing and Information Systems

In addition to text data analysis, image analysis is an area that has increasingly gained importance in recent years because more and more image data have spread throughout the internet and real life. As an important segment of image analysis techniques, image restoration has been attracting a lot of researchers’ attention. As one of AI methodologies, Self-organizing Maps (SOMs) have been applied to a great number of useful applications. However, it has rarely been applied to the domain of image restoration. In this paper, we propose a novel image restoration method by leveraging the capability of SOMs, and we name …


Dycuckoo: Dynamic Hash Tables On Gpus, Yuchen Li, Qiwei Zhu, Zheng Lyu, Zhongdong Huang, Jianling Sun Apr 2021

Dycuckoo: Dynamic Hash Tables On Gpus, Yuchen Li, Qiwei Zhu, Zheng Lyu, Zhongdong Huang, Jianling Sun

Research Collection School Of Computing and Information Systems

The hash table is a fundamental structure that has been implemented on graphics processing units (GPUs) to accelerate a wide range of analytics workloads. Most existing works have focused on static scenarios and occupy large GPU memory to maximize the insertion efficiency. In many cases, data stored in hash tables get updated dynamically, and existing approaches use unnecessarily large memory resources. One naïve solution is to rebuild a hash table (known as rehashing) whenever it is either filled or mostly empty. However, this approach renders significant overheads for rehashing. In this paper, we propose a novel dynamic cuckoo hash table …


Newslink: Empowering Intuitive News Search With Knowledge Graphs, Yueji Yang, Yuchen Li, Anthony Tung Apr 2021

Newslink: Empowering Intuitive News Search With Knowledge Graphs, Yueji Yang, Yuchen Li, Anthony Tung

Research Collection School Of Computing and Information Systems

News search tools help end users to identify relevant news stories. However, existing search approaches often carry out in a "black-box" process. There is little intuition that helps users understand how the results are related to the query. In this paper, we propose a novel news search framework, called NEWSLINK, to empower intuitive news search by using relationship paths discovered from open Knowledge Graphs (KGs). Specifically, NEWSLINK embeds both a query and news documents to subgraphs, called subgraph embeddings, in the KG. Their embeddings' overlap induces relationship paths between the involving entities. Two major advantages are obtained by incorporating subgraph …


Dram Failure Prediction In Aiops: Empirical Evaluation, Challenges And Opportunities, Zhiyue Wu, Hongzuo Xu, Guansong Pang, Fengyuan Yu, Yijie Wang, Songlei Jian, Yongjun Wang Apr 2021

Dram Failure Prediction In Aiops: Empirical Evaluation, Challenges And Opportunities, Zhiyue Wu, Hongzuo Xu, Guansong Pang, Fengyuan Yu, Yijie Wang, Songlei Jian, Yongjun Wang

Research Collection School Of Computing and Information Systems

DRAM failure prediction is a vital task in AIOps, which is crucial to maintain the reliability and sustainable service of large-scale data centers. However, limited work has been done on DRAM failure prediction mainly due to the lack of public available datasets. This paper presents a comprehensive empirical evaluation of diverse machine learning techniques for DRAM failure prediction using a large-scale multisource dataset, including more than three millions of records of kernel, address, and mcelog data, provided by Alibaba Cloud through PAKDD 2021 competition. Particularly, we first formulate the problem as a multiclass classification task and exhaustively evaluate seven popular/stateof-the-art …


Efficient Retrieval Of Matrix Factorization-Based Top-K Recommendations: A Survey Of Recent Approaches, Duy Dung Le, Hady W. Lauw Apr 2021

Efficient Retrieval Of Matrix Factorization-Based Top-K Recommendations: A Survey Of Recent Approaches, Duy Dung Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Top-k recommendation seeks to deliver a personalized list of k items to each individual user. An established methodology in the literature based on matrix factorization (MF), which usually represents users and items as vectors in low-dimensional space, is an effective approach to recommender systems, thanks to its superior performance in terms of recommendation quality and scalability. A typical matrix factorization recommender system has two main phases: preference elicitation and recommendation retrieval. The former analyzes user-generated data to learn user preferences and item characteristics in the form of latent feature vectors, whereas the latter ranks the candidate items based on the …


Dbl: Efficient Reachability Queries On Dynamic Graphs, Qiuyi Lyu, Yuchen Li, Bingsheng He, Bin Gong Apr 2021

Dbl: Efficient Reachability Queries On Dynamic Graphs, Qiuyi Lyu, Yuchen Li, Bingsheng He, Bin Gong

Research Collection School Of Computing and Information Systems

Reachability query is a fundamental problem on graphs, which has been extensively studied in academia and industry. Since graphs are subject to frequent updates in many applications, it is essential to support efficient graph updates while offering good performance in reachability queries. Existing solutions compress the original graph with the Directed Acyclic Graph (DAG) and propose efficient query processing and index update techniques. However, they focus on optimizing the scenarios where the Strong Connected Components (SCCs) remain unchanged and have overlooked the prohibitively high cost of the DAG maintenance when SCCs are updated. In this paper, we propose DBL, an …


Learning Adl Daily Routines With Spatiotemporal Neural Networks, Shan Gao, Ah-Hwee Tan, Rossi Setchi Jan 2021

Learning Adl Daily Routines With Spatiotemporal Neural Networks, Shan Gao, Ah-Hwee Tan, Rossi Setchi

Research Collection School Of Computing and Information Systems

The activities of daily living (ADLs) refer to the activities performed by individuals on a daily basis and are the indicators of a person’s habits, lifestyle, and wellbeing. Learning an individual’s ADL daily routines has significant value in the healthcare domain. Specifically, ADL recognition and inter-ADL pattern learning problems have been studied extensively in the past couple of decades. However, discovering the patterns performed in a day and clustering them into ADL daily routines has been a relatively unexplored research area. In this paper, a self-organizing neural network model, called the Spatiotemporal ADL Adaptive Resonance Theory (STADLART), is proposed for …


Temporal Heterogeneous Interaction Graph Embedding For Next-Item Recommendation, Yugang Ji, Mingyang Yin, Yuan Fang, Hongxia Yang, Xiangwei Wang, Tianrui Jia, Chuan Shi Sep 2020

Temporal Heterogeneous Interaction Graph Embedding For Next-Item Recommendation, Yugang Ji, Mingyang Yin, Yuan Fang, Hongxia Yang, Xiangwei Wang, Tianrui Jia, Chuan Shi

Research Collection School Of Computing and Information Systems

In the scenario of next-item recommendation, previous methods attempt to model user preferences by capturing the evolution of sequential interactions. However, their sequential expression is often limited, without modeling complex dynamics that short-term demands can often be influenced by long-term habits. Moreover, few of them take into account the heterogeneous types of interaction between users and items. In this paper, we model such complex data as a Temporal Heterogeneous Interaction Graph (THIG) and learn both user and item embeddings on THIGs to address next-item recommendation. The main challenges involve two aspects: the complex dynamics and rich heterogeneity of interactions. We …


Querying Recurrent Convoys Over Trajectory Data, Munkh-Erdene Yadamjav, Zhifeng Bao, Baihua Zheng, Farhana M. Choudhury, Hanan Samet Sep 2020

Querying Recurrent Convoys Over Trajectory Data, Munkh-Erdene Yadamjav, Zhifeng Bao, Baihua Zheng, Farhana M. Choudhury, Hanan Samet

Research Collection School Of Computing and Information Systems

Moving objects equipped with location-positioning devices continuously generate a large amount of spatio-temporal trajectory data. An interesting finding over a trajectory stream is a group of objects that are travelling together for a certain period of time. Existing studies on mining co-moving objects do not consider an important correlation between co-moving objects, which is the reoccurrence of the movement pattern. In this study, we define a problem of finding recurrent pattern of co-moving objects from streaming trajectories and propose an efficient solution that enables us to discover recent co-moving object patterns repeated within a given time period. Experimental results on …


Social Participation Performance Of Wheelchair Users Using Clustering And Geolocational Sensor's Data, Yukun Yin, Kar Way Tan Aug 2020

Social Participation Performance Of Wheelchair Users Using Clustering And Geolocational Sensor's Data, Yukun Yin, Kar Way Tan

Research Collection School Of Computing and Information Systems

For wheelchair users, social participation and physical mobility play a significant part in determining their mental health and quality of life outcomes. However, little is known about how wheelchair users move about and engage in social interactions within their life-spaces. In this project, we investigate the social participation performance of the wheelchair users based on a combination of geolocational and lifestyle survey data collected over a period of three months. This paper adopts a multi-variate approach combining geolocational travel patterns and various factors such as independence, willingness and self-perception to provide multi-faceted analysis to their lifestyles. We provide profiles of …


Learning Transferrable Parameters For Long-Tailed Sequential User Behavior Modeling, Jianwen Yin, Chenghao Liu, Weiqing Wang, Jianling Sun, Steven C. H. Hoi Aug 2020

Learning Transferrable Parameters For Long-Tailed Sequential User Behavior Modeling, Jianwen Yin, Chenghao Liu, Weiqing Wang, Jianling Sun, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of historical behaviors. However, the number of user behaviors inherently follows a long-tailed distribution, which has been seldom explored. In this work, we argue that focusing on tail users could bring more benefits and address the long tails issue by learning transferrable parameters from both optimization and feature perspectives. Specifically, we propose a gradient alignment optimizer and adopt an adversarial training scheme to facilitate knowledge transfer …


Feature Agglomeration Networks For Single Stage Face Detection, Jialiang Zhang, Xiongwei Wu, Steven C. H. Hoi, Jianke Zhu Mar 2020

Feature Agglomeration Networks For Single Stage Face Detection, Jialiang Zhang, Xiongwei Wu, Steven C. H. Hoi, Jianke Zhu

Research Collection School Of Computing and Information Systems

Recent years have witnessed promising results of exploring deep convolutional neural network for face detection. Despite making remarkable progress, face detection in the wild remains challenging especially when detecting faces at vastly different scales and characteristics. In this paper, we propose a novel simple yet effective framework of “Feature Agglomeration Networks” (FANet) to build a new single-stage face detector, which not only achieves state-of-the-art performance but also runs efficiently. As inspired by Feature Pyramid Networks (FPN) (Lin et al., 2017), the key idea of our framework is to exploit inherent multi-scale features of a single convolutional neural network by aggregating …


Identifying Regional Trends In Avatar Customization, Peter Mawhorter, Sercan Sengun, Haewoon Kwak, D. Fox Harrell Dec 2019

Identifying Regional Trends In Avatar Customization, Peter Mawhorter, Sercan Sengun, Haewoon Kwak, D. Fox Harrell

Research Collection School Of Computing and Information Systems

Since virtual identities such as social media profiles and avatars have become a common venue for self-expression, it has become important to consider the ways in which existing systems embed the values of their designers. In order to design virtual identity systems that reflect the needs and preferences of diverse users, understanding how the virtual identity construction differs between groups is important. This paper presents a new methodology that leverages deep learning and differential clustering for comparative analysis of profile images, with a case study of almost 100 000 avatars from a large online community using a popular avatar creation …


Ridesourcing Systems: A Framework And Review, Hai Wang, Hai Yang Nov 2019

Ridesourcing Systems: A Framework And Review, Hai Wang, Hai Yang

Research Collection School Of Computing and Information Systems

With the rapid development and popularization of mobile and wireless communication technologies, ridesourcing companies have been able to leverage internet-based platforms to operate e-hailing services in many cities around the world. These companies connect passengers and drivers in real time and are disruptively changing the transportation indus- try. As pioneers in a general sharing economy context, ridesourcing shared transportation platforms consist of a typical two-sided market. On the demand side, passengers are sensi- tive to the price and quality of the service. On the supply side, drivers, as freelancers, make working decisions flexibly based on their income from the platform …


Deep Hashing By Discriminating Hard Examples, Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen, Jun Zhou, Edwin Hancock Oct 2019

Deep Hashing By Discriminating Hard Examples, Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen, Jun Zhou, Edwin Hancock

Research Collection School Of Computing and Information Systems

This paper tackles a rarely explored but critical problem within learning to hash, i.e., to learn hash codes that effectively discriminate hard similar and dissimilar examples, to empower large-scale image retrieval. Hard similar examples refer to image pairs from the same semantic class that demonstrate some shared appearance but have different fine-grained appearance. Hard dissimilar examples are image pairs that come from different semantic classes but exhibit similar appearance. These hard examples generally have a small distance due to the shared appearance. Therefore, effective encoding of the hard examples can well discriminate the relevant images within a small Hamming distance, …


Why Reinventing The Wheels? An Empirical Study On Library Reuse And Re-Implementation, Bowen Xu, Le An, Ferdian Thung, Foutse Khomh, David Lo Sep 2019

Why Reinventing The Wheels? An Empirical Study On Library Reuse And Re-Implementation, Bowen Xu, Le An, Ferdian Thung, Foutse Khomh, David Lo

Research Collection School Of Computing and Information Systems

Nowadays, with the rapid growth of open source software (OSS), library reuse becomes more and more popular since a large amount of third- party libraries are available to download and reuse. A deeper understanding on why developers reuse a library (i.e., replacing self-implemented code with an external library) or re-implement a library (i.e., replacing an imported external library with self-implemented code) could help researchers better understand the factors that developers are concerned with when reusing code. This understanding can then be used to improve existing libraries and API recommendation tools for researchers and practitioners by using the developers concerns identified …