Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 158

Full-Text Articles in Databases and Information Systems

Quantum Machine Learning For Credit Scoring, Nikolaos Schetakis, Davit Aghamalyan, Micheael Boguslavsky, Agnieszka Rees, Marc Rakotomalala, Paul Robert Griffin May 2024

Quantum Machine Learning For Credit Scoring, Nikolaos Schetakis, Davit Aghamalyan, Micheael Boguslavsky, Agnieszka Rees, Marc Rakotomalala, Paul Robert Griffin

Research Collection School Of Computing and Information Systems

This study investigates the integration of quantum circuits with classical neural networks for enhancing credit scoring for small- and medium-sized enterprises (SMEs). We introduce a hybrid quantum–classical model, focusing on the synergy between quantum and classical rather than comparing the performance of separate quantum and classical models. Our model incorporates a quantum layer into a traditional neural network, achieving notable reductions in training time. We apply this innovative framework to a binary classification task with a proprietary real-world classical credit default dataset for SMEs in Singapore. The results indicate that our hybrid model achieves efficient training, requiring significantly fewer epochs …


Screening Through A Broad Pool: Towards Better Diversity For Lexically Constrained Text Generation, Changsen Yuan, Heyan Huang, Yixin Cao, Qianwen Cao Mar 2024

Screening Through A Broad Pool: Towards Better Diversity For Lexically Constrained Text Generation, Changsen Yuan, Heyan Huang, Yixin Cao, Qianwen Cao

Research Collection School Of Computing and Information Systems

Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert …


Imitate The Good And Avoid The Bad: An Incremental Approach To Safe Reinforcement Learning, Minh Huy Hoang, Mai Anh Tien, Pradeep Varakantham Feb 2024

Imitate The Good And Avoid The Bad: An Incremental Approach To Safe Reinforcement Learning, Minh Huy Hoang, Mai Anh Tien, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

A popular framework for enforcing safe actions in Reinforcement Learning (RL) is Constrained RL, where trajectory based constraints on expected cost (or other cost measures) are employed to enforce safety and more importantly these constraints are enforced while maximizing expected reward. Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem that can be solved using minor modifications to RL methods. A key drawback with such approaches is an over or underestimation of the cost constraint at each state. Therefore, we provide an approach that does not modify the trajectory based cost constraint …


Recommendations With Minimum Exposure Guarantees: A Post-Processing Framework, Ramon Lopes, Rodrigo Alves, Antoine Ledent, Rodrygo L. T. Santos, Marius Kloft Feb 2024

Recommendations With Minimum Exposure Guarantees: A Post-Processing Framework, Ramon Lopes, Rodrigo Alves, Antoine Ledent, Rodrygo L. T. Santos, Marius Kloft

Research Collection School Of Computing and Information Systems

Relevance-based ranking is a popular ingredient in recommenders, but it frequently struggles to meet fairness criteria because social and cultural norms may favor some item groups over others. For instance, some items might receive lower ratings due to some sort of bias (e.g. gender bias). A fair ranking should balance the exposure of items from advantaged and disadvantaged groups. To this end, we propose a novel post-processing framework to produce fair, exposure-aware recommendations. Our approach is based on an integer linear programming model maximizing the expected utility while satisfying a minimum exposure constraint. The model has fewer variables than previous …


C³: Code Clone-Based Identification Of Duplicated Components, Yanming Yang, Ying Zou, Xing Hu, David Lo, Chao Ni, John C. Grundy, Xin: Xia Dec 2023

C³: Code Clone-Based Identification Of Duplicated Components, Yanming Yang, Ying Zou, Xing Hu, David Lo, Chao Ni, John C. Grundy, Xin: Xia

Research Collection School Of Computing and Information Systems

Reinventing the wheel is a detrimental programming practice in software development that frequently results in the introduction of duplicated components. This practice not only leads to increased maintenance and labor costs but also poses a higher risk of propagating bugs throughout the system. Despite numerous issues introduced by duplicated components in software, the identification of component-level clones remains a significant challenge that existing studies struggle to effectively tackle. Specifically, existing methods face two primary limitations that are challenging to overcome: 1) Measuring the similarity between different components presents a challenge due to the significant size differences among them; 2) Identifying …


Toward Intention Discovery For Early Malice Detection In Cryptocurrency, Ling Cheng, Feida Zhu, Yong Wang, Ruicheng Liang, Huiwen Liu Oct 2023

Toward Intention Discovery For Early Malice Detection In Cryptocurrency, Ling Cheng, Feida Zhu, Yong Wang, Ruicheng Liang, Huiwen Liu

Research Collection School Of Computing and Information Systems

Cryptocurrency’s pseudo-anonymous nature makes it vulnerable to malicious activities. However, existing deep learning solutions lack interpretability and only support retrospective analysis of specific malice types. To address these challenges, we propose Intention-Monitor for early malice detection in Bitcoin. Our model, utilizing Decision-Tree based feature Selection and Complement (DT-SC), builds different feature sets for different malice types. The Status Proposal Module (SPM) and hierarchical self-attention predictor provide real-time global status and address label predictions. A survival module determines the stopping point and proposes the status sequence (intention). Our model detects various malicious activities with strong interpretability, outperforming state-of-the-art methods in extensive …


Threshold Attribute-Based Credentials With Redactable Signature, Rui Shi, Huamin Feng, Yang Yang, Feng Yuan, Yingjiu Li, Hwee Hwa Pang, Robert H. Deng Sep 2023

Threshold Attribute-Based Credentials With Redactable Signature, Rui Shi, Huamin Feng, Yang Yang, Feng Yuan, Yingjiu Li, Hwee Hwa Pang, Robert H. Deng

Research Collection School Of Computing and Information Systems

Threshold attribute-based credentials are suitable for decentralized systems such as blockchains as such systems generally assume that authenticity, confidentiality, and availability can still be guaranteed in the presence of a threshold number of dishonest or faulty nodes. Coconut (NDSS'19) was the first selective disclosure attribute-based credentials scheme supporting threshold issuance. However, it does not support threshold tracing of user identities and threshold revocation of user credentials, which is desired for internal governance such as identity management, data auditing, and accountability. The communication and computation complexities of Coconut for verifying credentials are linear in the number of each user's attributes and …


Document-Level Relation Extraction Via Separate Relation Representation And Logical Reasoning, Heyan Huang, Changsen Yuan, Qian Liu, Yixin Cao Aug 2023

Document-Level Relation Extraction Via Separate Relation Representation And Logical Reasoning, Heyan Huang, Changsen Yuan, Qian Liu, Yixin Cao

Research Collection School Of Computing and Information Systems

Document-level relation extraction (RE) extends the identification of entity/mentions’ relation from the single sentence to the long document. It is more realistic and poses new challenges to relation representation and reasoning skills. In this article, we propose a novel model, SRLR, using Separate Relation Representation and Logical Reasoning considering the indirect relation representation and complex reasoning of evidence sentence problems. Specifically, we first expand the judgment of relational facts from the entity-level to the mention-level, highlighting fine-grained information to capture the relation representation for the entity pair. Second, we propose a logical reasoning module to identify evidence sentences and conduct …


Glocal Energy-Based Learning For Few-Shot Open-Set Recognition, Haoyu Wang, Guansong Pang, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang Jun 2023

Glocal Energy-Based Learning For Few-Shot Open-Set Recognition, Haoyu Wang, Guansong Pang, Peng Wang, Lei Zhang, Wei Wei, Yanning Zhang

Research Collection School Of Computing and Information Systems

Few-shot open-set recognition (FSOR) is a challenging task of great practical value. It aims to categorize a sample to one of the pre-defined, closed-set classes illustrated by few examples while being able to reject the sample from unknown classes. In this work, we approach the FSOR task by proposing a novel energy-based hybrid model. The model is composed of two branches, where a classification branch learns a metric to classify a sample to one of closedset classes and the energy branch explicitly estimates the open-set probability. To achieve holistic detection of openset samples, our model leverages both class-wise and pixelwise …


Deep Isolation Forest For Anomaly Detection, Hongzuo Xu, Guansong Pang, Yijie Wang, Yongjun Wang Apr 2023

Deep Isolation Forest For Anomaly Detection, Hongzuo Xu, Guansong Pang, Yijie Wang, Yongjun Wang

Research Collection School Of Computing and Information Systems

Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear axis-parallel isolation method often leads to (i) failure in detecting hard anomalies that are difficult to isolate in high-dimensional/non-linear-separable data space, and (ii) notorious algorithmic bias that assigns unexpectedly lower anomaly scores to artefact regions. These issues contribute to high false negative errors. Several iForest extensions are introduced, but they essentially still employ shallow, linear data partition, restricting their power in isolating true anomalies. Therefore, this paper proposes deep isolation …


Morphologically-Aware Vocabulary Reduction Of Word Embeddings, Chong Cher Chia, Maksim Tkachenko, Hady Wirawan Lauw Apr 2023

Morphologically-Aware Vocabulary Reduction Of Word Embeddings, Chong Cher Chia, Maksim Tkachenko, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages.


Nftdisk: Visual Detection Of Wash Trading In Nft Markets, Xiaolin Wen, Yong Wang, Xuanwu Yue, Feida Zhu, Min Zhu Apr 2023

Nftdisk: Visual Detection Of Wash Trading In Nft Markets, Xiaolin Wen, Yong Wang, Xuanwu Yue, Feida Zhu, Min Zhu

Research Collection School Of Computing and Information Systems

With the growing popularity of Non-Fungible Tokens (NFT), a new type of digital assets, various fraudulent activities have appeared in NFT markets. Among them, wash trading has become one of the most common frauds in NFT markets, which attempts to mislead investors by creating fake trading volumes. Due to the sophisticated patterns of wash trading, only a subset of them can be detected by automatic algorithms, and manual inspection is usually required. We propose NFTDisk, a novel visualization for investors to identify wash trading activities in NFT markets, where two linked visualization modules are presented: a radial visualization module with …


Mirror: Mining Implicit Relationships Via Structure-Enhanced Graph Convolutional Networks, Jiaying Liu, Feng Xia, Jing Ren, Bo Xu, Guansong Pang, Lianhua Chi Feb 2023

Mirror: Mining Implicit Relationships Via Structure-Enhanced Graph Convolutional Networks, Jiaying Liu, Feng Xia, Jing Ren, Bo Xu, Guansong Pang, Lianhua Chi

Research Collection School Of Computing and Information Systems

Data explosion in the information society drives people to develop more effective ways to extract meaningful information. Extracting semantic information and relational information has emerged as a key mining primitive in a wide variety of practical applications. Existing research on relation mining has primarily focused on explicit connections and ignored underlying information, e.g., the latent entity relations. Exploring such information (defined as implicit relationships in this article) provides an opportunity to reveal connotative knowledge and potential rules. In this article, we propose a novel research topic, i.e., how to identify implicit relationships across heterogeneous networks. Specially, we first give a …


Scalable And Globally Optimal Generalized L1 K-Center Clustering Via Constraint Generation In Mixed Integer Linear Programming, Aravinth Chembu, Scott Sanner, Hassan Khurram, Akshat Kumar Feb 2023

Scalable And Globally Optimal Generalized L1 K-Center Clustering Via Constraint Generation In Mixed Integer Linear Programming, Aravinth Chembu, Scott Sanner, Hassan Khurram, Akshat Kumar

Research Collection School Of Computing and Information Systems

The k-center clustering algorithm, introduced over 35 years ago, is known to be robust to class imbalance prevalent in many clustering problems and has various applications such as data summarization, document clustering, and facility location determination. Unfortunately, existing k-center algorithms provide highly suboptimal solutions that can limit their practical application, reproducibility, and clustering quality. In this paper, we provide a novel scalable and globally optimal solution to a popular variant of the k-center problem known as generalized L1 k-center clustering that uses L1 distance and allows the selection of arbitrary vectors as cluster centers. We show that this clustering objective …


Survey On Sentiment Analysis: Evolution Of Research Methods And Topics, Jingfeng Cui, Zhaoxia Wang, Seng-Beng Ho, Erik Cambria Jan 2023

Survey On Sentiment Analysis: Evolution Of Research Methods And Topics, Jingfeng Cui, Zhaoxia Wang, Seng-Beng Ho, Erik Cambria

Research Collection School Of Computing and Information Systems

Sentiment analysis, one of the research hotspots in the natural language processing field, has attracted the attention of researchers, and research papers on the field are increasingly published. Many literature reviews on sentiment analysis involving techniques, methods, and applications have been produced using different survey methodologies and tools, but there has not been a survey dedicated to the evolution of research methods and topics of sentiment analysis. There have also been few survey works leveraging keyword co-occurrence on sentiment analysis. Therefore, this study presents a survey of sentiment analysis focusing on the evolution of research methods and topics. It incorporates …


Meta-Complementing The Semantics Of Short Texts In Neural Topic Models, Ce Zhang, Hady Wirawan Lauw Nov 2022

Meta-Complementing The Semantics Of Short Texts In Neural Topic Models, Ce Zhang, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

Topic models infer latent topic distributions based on observed word co-occurrences in a text corpus. While typically a corpus contains documents of variable lengths, most previous topic models treat documents of different lengths uniformly, assuming that each document is sufficiently informative. However, shorter documents may have only a few word co-occurrences, resulting in inferior topic quality. Some other previous works assume that all documents are short, and leverage external auxiliary data, e.g., pretrained word embeddings and document connectivity. Orthogonal to existing works, we remedy this problem within the corpus itself by proposing a Meta-Complement Topic Model, which improves topic quality …


Towards An Optimal Bus Frequency Scheduling: When The Waiting Time Matters, Songsong Mo, Zhifeng Bao, Baihua Zheng, Zhiyong Peng Sep 2022

Towards An Optimal Bus Frequency Scheduling: When The Waiting Time Matters, Songsong Mo, Zhifeng Bao, Baihua Zheng, Zhiyong Peng

Research Collection School Of Computing and Information Systems

Reorganizing bus frequencies to cater for actual travel demands can significantly save the cost of the public transport system. This paper studies the bus frequency optimization problem considering the user satisfaction. Specifically, for the first time to our best knowledge, we study how to schedule the buses such that the total number of passengers who could receive their bus services within the waiting time threshold can be maximized. We propose two variants of the problem, FAST and FASTCO, to cater for different application needs and prove that both are NP-hard. To solve FAST effectively and efficiently, we first present an …


Variational Graph Author Topic Modeling, Ce Zhang, Hady Wirawan Lauw Aug 2022

Variational Graph Author Topic Modeling, Ce Zhang, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

While Variational Graph Auto-Encoder (VGAE) has presented promising ability to learn representations for documents, most existing VGAE methods do not model a latent topic structure and therefore lack semantic interpretability. Exploring hidden topics within documents and discovering key words associated with each topic allow us to develop a semantic interpretation of the corpus. Moreover, documents are usually associated with authors. For example, news reports have journalists specializing in writing certain type of events, academic papers have authors with expertise in certain research topics, etc. Modeling authorship information could benefit topic modeling, since documents by the same authors tend to reveal …


Learning Transferable Perturbations For Image Captioning, Hanjie Wu, Yongtuo Liu, Hongmin Cai, Shengfeng He May 2022

Learning Transferable Perturbations For Image Captioning, Hanjie Wu, Yongtuo Liu, Hongmin Cai, Shengfeng He

Research Collection School Of Computing and Information Systems

Present studies have discovered that state-of-the-art deep learning models can be attacked by small but well-designed perturbations. Existing attack algorithms for the image captioning task is time-consuming, and their generated adversarial examples cannot transfer well to other models. To generate adversarial examples faster and stronger, we propose to learn the perturbations by a generative model that is governed by three novel loss functions. Image feature distortion loss is designed to maximize the encoded image feature distance between original images and the corresponding adversarial examples at the image domain, and local-global mismatching loss is introduced to separate the mapping encoding representation …


Context-Aware Graph Convolutional Network For Dynamic Origin-Destination Prediction, Juan Nathaniel, Baihua Zheng Dec 2021

Context-Aware Graph Convolutional Network For Dynamic Origin-Destination Prediction, Juan Nathaniel, Baihua Zheng

Research Collection School Of Computing and Information Systems

A robust Origin-Destination (OD) prediction is key to urban mobility. A good forecasting model can reduce operational risks and improve service availability, among many other upsides. Here, we examine the use of Graph Convolutional Net-work (GCN) and its hybrid Markov-Chain (GCN-MC) variant to perform a context-aware OD prediction based on a large-scale public transportation dataset in Singapore. Compared with the baseline Markov-Chain algorithm and GCN, the proposed hybrid GCN-MC model improves the prediction accuracy by 37% and 12% respectively. Lastly, the addition of temporal and historical contextual information further improves the performance of the proposed hybrid model by 4 –12%.


On A Multistage Discrete Stochastic Optimization Problem With Stochastic Constraints And Nested Sampling, Thuy Anh Ta, Tien Mai, Fabian Bastin, Pierre L'Ecuyer Nov 2021

On A Multistage Discrete Stochastic Optimization Problem With Stochastic Constraints And Nested Sampling, Thuy Anh Ta, Tien Mai, Fabian Bastin, Pierre L'Ecuyer

Research Collection School Of Computing and Information Systems

We consider a multistage stochastic discrete program in which constraints on any stage might involve expectations that cannot be computed easily and are approximated by simulation. We study a sample average approximation (SAA) approach that uses nested sampling, in which at each stage, a number of scenarios are examined and a number of simulation replications are performed for each scenario to estimate the next-stage constraints. This approach provides an approximate solution to the multistage problem. To establish the consistency of the SAA approach, we first consider a two-stage problem and show that in the second-stage problem, given a scenario, the …


Expediting The Accuracy-Improving Process Of Svms For Class Imbalance Learning, Bin Cao, Yuqi Liu, Chenyu Hou, Jing Fan, Baihua Zheng, Jianwei Jin Nov 2021

Expediting The Accuracy-Improving Process Of Svms For Class Imbalance Learning, Bin Cao, Yuqi Liu, Chenyu Hou, Jing Fan, Baihua Zheng, Jianwei Jin

Research Collection School Of Computing and Information Systems

To improve the classification performance of support vector machines (SVMs) on imbalanced datasets, cost-sensitive learning methods have been proposed, e.g., DEC (Different Error Costs) and FSVM-CIL (Fuzzy SVM for Class Imbalance Learning). They relocate the hyperplane by adjusting the costs associated with misclassifying samples. However, the error costs are determined either empirically or by performing an exhaustive search in the parameter space. Both strategies can not guarantee effectiveness and efficiency simultaneously. In this paper, we propose ATEC, a solution that can efficiently find a preferable hyperplane by automatically tuning the error cost for between-class samples. ATEC distinguishes itself from all …


Unified And Incremental Simrank: Index-Free Approximation With Scheduled Principle, Fanwei Zhu, Yuan Fang, Kai Zhang, Kevin C.-C. Chang, Hongtai Cao, Zhen Jiang, Minghui Wu Sep 2021

Unified And Incremental Simrank: Index-Free Approximation With Scheduled Principle, Fanwei Zhu, Yuan Fang, Kai Zhang, Kevin C.-C. Chang, Hongtai Cao, Zhen Jiang, Minghui Wu

Research Collection School Of Computing and Information Systems

SimRank is a popular link-based similarity measure on graphs. It enables a variety of applications with different modes of querying (e.g., single-pair, single-source and all-pair modes). In this paper, we propose UISim, a unified and incremental framework for all SimRank modes based on a scheduled approximation principle. UISim processes queries with incremental and prioritized exploration of the entire computation space, and thus allows flexible tradeoff of time and accuracy. On the other hand, it creates and shares common “building blocks” for online computation without relying on indexes, and thus is efficient to handle both static and dynamic graphs. Our experiments …


Quantum Computing For Supply Chain Finance, Paul R. Griffin, Ritesh Sampat Sep 2021

Quantum Computing For Supply Chain Finance, Paul R. Griffin, Ritesh Sampat

Research Collection School Of Computing and Information Systems

Applying quantum computing to real world applications to assess the potential efficacy is a daunting task for non-quantum specialists. This paper shows an implementation of two quantum optimization algorithms applied to portfolios of trade finance portfolios and compares the selections to those chosen by experienced underwriters and a classical optimizer. The method used is to map the financial risk and returns for a trade finance portfolio to an optimization function of a quantum algorithm developed in a Qiskit tutorial. The results show that whilst there is no advantage seen by using the quantum algorithms, the performance of the quantum algorithms …


Self-Adaptive Graph Traversal On Gpus, Mo Sha, Yuchen Li, Kian-Lee Tan Jun 2021

Self-Adaptive Graph Traversal On Gpus, Mo Sha, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

GPU’s massive computing power offers unprecedented opportunities to enable large graph analysis. Existing studies proposed various preprocessing approaches that convert the input graphs into dedicated structures for GPU-based optimizations. However, these dedicated approaches incur significant preprocessing costs as well as weak programmability to build general graph applications. In this paper, we introduce SAGE, a self-adaptive graph traversal on GPUs, which is free from preprocessing and operates on ubiquitous graph representations directly. We propose Tiled Partitioning and Resident Tile Stealing to fully exploit the computing power of GPUs in a runtime and self-adaptive manner. We also propose Sampling-based Reordering to further …


A Fully Dynamic Algorithm For K-Regret Minimizing Sets, Yanhao Wang, Yuchen Li, Raymond Chi-Wing Wong, Kian-Lee Tan Apr 2021

A Fully Dynamic Algorithm For K-Regret Minimizing Sets, Yanhao Wang, Yuchen Li, Raymond Chi-Wing Wong, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Selecting a small set of representatives from a large database is important in many applications such as multi-criteria decision making, web search, and recommendation. The k-regret minimizing set (k-RMS) problem was recently proposed for representative tuple discovery. Specifically, for a large database P of tuples with multiple numerical attributes, the k-RMS problem returns a size-r subset Q of P such that, for any possible ranking function, the score of the top-ranked tuple in Q is not much worse than the score of the kth-ranked tuple in P. Although the k-RMS problem has been extensively studied in the literature, existing methods …


Improving Multi-Hop Knowledge Base Question Answering By Learning Intermediate Supervision Signals, Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, Ji Rong Wen Mar 2021

Improving Multi-Hop Knowledge Base Question Answering By Learning Intermediate Supervision Signals, Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, Ji Rong Wen

Research Collection School Of Computing and Information Systems

Multi-hop Knowledge Base Question Answering (KBQA) aims to find the answer entities that are multiple hops away in the Knowledge Base (KB) from the entities in the question. A major challenge is the lack of supervision signals at intermediate steps. Therefore, multi-hop KBQA algorithms can only receive the feedback from the final answer, which makes the learning unstable or ineffective. To address this challenge, we propose a novel teacher-student approach for the multi-hop KBQA task. In our approach, the student network aims to find the correct answer to the query, while the teacher network tries to learn intermediate supervision signals …


Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu Feb 2021

Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu

Research Collection School Of Computing and Information Systems

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set …


Deep Unsupervised Anomaly Detection, Tangqing Li, Zheng Wang, Siying Liu, Wen-Yan Lin Jan 2021

Deep Unsupervised Anomaly Detection, Tangqing Li, Zheng Wang, Siying Liu, Wen-Yan Lin

Research Collection School Of Computing and Information Systems

This paper proposes a novel method to detect anomalies in large datasets under a fully unsupervised setting. The key idea behind our algorithm is to learn the representation underlying normal data. To this end, we leverage the latest clustering technique suitable for handling high dimensional data. This hypothesis provides a reliable starting point for normal data selection. We train an autoencoder from the normal data subset, and iterate between hypothesizing normal candidate subset based on clustering and representation learning. The reconstruction error from the learned autoencoder serves as a scoring function to assess the normality of the data. Experimental results …


A Near-Optimal Change-Detection Based Algorithm For Piecewise-Stationary Combinatorial Semi-Bandits, Huozhi Zhou, Lingda Wang, Lav N. Varshney, Ee-Peng Lim Dec 2020

A Near-Optimal Change-Detection Based Algorithm For Piecewise-Stationary Combinatorial Semi-Bandits, Huozhi Zhou, Lingda Wang, Lav N. Varshney, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT logT), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also …