Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Artificial Intelligence and Robotics

PDF

Singapore Management University

Reinforcement Learning

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-Player Competitive Games, The Viet Bui, Tien Mai, Thanh H. Nguyen Jun 2023

Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-Player Competitive Games, The Viet Bui, Tien Mai, Thanh H. Nguyen

Research Collection School Of Computing and Information Systems

Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies adopted by an adversary agent can influence a target RL agent (victim agent) to perform poorly in a multi-agent environment. In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent. There is a key shortcoming of this approach --- knowledge derived from historical interactions may not be properly generalized to unexplored policy regions of the victim agent, making the trained adversarial policy significantly less effective. In this work, we design a new effective adversarial policy learning algorithm that overcomes …


Reinforcement Learning Approach To Coordinate Real-World Multi-Agent Dynamic Routing And Scheduling, Joe Waldy Nov 2022

Reinforcement Learning Approach To Coordinate Real-World Multi-Agent Dynamic Routing And Scheduling, Joe Waldy

Dissertations and Theses Collection (Open Access)

In this thesis, we study new variants of routing and scheduling problems motivated by real-world problems from the urban logistics and law enforcement domains. In particular, we focus on two key aspects: dynamic and multi-agent. While routing problems such as the Vehicle Routing Problem (VRP) is well-studied in the Operations Research (OR) community, we know that in real-world route planning today, initially-planned route plans and schedules may be disrupted by dynamically-occurring events. In addition, routing and scheduling plans cannot be done in silos due to the presence of other agents which may be independent and self-interested. These requirements create …


Reinforcement Learning Approach To Solve Dynamic Bi-Objective Police Patrol Dispatching And Rescheduling Problem, Waldy Joe, Hoong Chuin Lau, Jonathan Pan Jun 2022

Reinforcement Learning Approach To Solve Dynamic Bi-Objective Police Patrol Dispatching And Rescheduling Problem, Waldy Joe, Hoong Chuin Lau, Jonathan Pan

Research Collection School Of Computing and Information Systems

Police patrol aims to fulfill two main objectives namely to project presence and to respond to incidents in a timely manner. Incidents happen dynamically and can disrupt the initially-planned patrol schedules. The key decisions to be made will be which patrol agent to be dispatched to respond to an incident and subsequently how to adapt the patrol schedules in response to such dynamically-occurring incidents whilst still fulfilling both objectives; which sometimes can be conflicting. In this paper, we define this real-world problem as a Dynamic Bi-Objective Police Patrol Dispatching and Rescheduling Problem and propose a solution approach that combines Deep …


Hierarchical Value Decomposition For Effective On-Demand Ride Pooling, Hao Jiang, Pradeep Varakantham May 2022

Hierarchical Value Decomposition For Effective On-Demand Ride Pooling, Hao Jiang, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

On-demand ride-pooling (e.g., UberPool, GrabShare) services focus on serving multiple different customer requests using each vehicle, i.e., an empty or partially filled vehicle can be assigned requests from different passengers with different origins and destinations. On the other hand, in Taxi on Demand (ToD) services (e.g., UberX), one vehicle is assigned to only one request at a time. On-demand ride pooling is not only beneficial to customers (lower cost), drivers (higher revenue per trip) and aggregation companies (higher revenue), but is also of crucial importance to the environment as it reduces the number of vehicles required on the roads. Since …


Burst-Induced Multi-Armed Bandit For Learning Recommendation, Rodrigo Alves, Antoine Ledent, Marius Kloft Oct 2021

Burst-Induced Multi-Armed Bandit For Learning Recommendation, Rodrigo Alves, Antoine Ledent, Marius Kloft

Research Collection School Of Computing and Information Systems

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the …


Toward Deep Supervised Anomaly Detection: Reinforcement Learning From Partially Labeled Anomaly Data, Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao Aug 2021

Toward Deep Supervised Anomaly Detection: Reinforcement Learning From Partially Labeled Anomaly Data, Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao

Research Collection School Of Computing and Information Systems

We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset. This is a common scenario in many important applications. Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data. We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies. This approach learns the known abnormality by automatically interacting with an anomalybiased simulation environment, while continuously extending the …


End-To-End Deep Reinforcement Learning For Multi-Agent Collaborative Exploration, Zichen Chen, Budhitama Subagdja, Ah-Hwee Tan Oct 2019

End-To-End Deep Reinforcement Learning For Multi-Agent Collaborative Exploration, Zichen Chen, Budhitama Subagdja, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Exploring an unknown environment by multiple autonomous robots is a major challenge in robotics domains. As multiple robots are assigned to explore different locations, they may interfere each other making the overall tasks less efficient. In this paper, we present a new model called CNN-based Multi-agent Proximal Policy Optimization (CMAPPO) to multi-agent exploration wherein the agents learn the effective strategy to allocate and explore the environment using a new deep reinforcement learning architecture. The model combines convolutional neural network to process multi-channel visual inputs, curriculum-based learning, and PPO algorithm for motivation based reinforcement learning. Evaluations show that the proposed method …


Knowledge-Based Exploration For Reinforcement Learning In Self-Organizing Neural Networks, Teck-Hou Teng, Ah-Hwee Tan Dec 2012

Knowledge-Based Exploration For Reinforcement Learning In Self-Organizing Neural Networks, Teck-Hou Teng, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Exploration is necessary during reinforcement learning to discover new solutions in a given problem space. Most reinforcement learning systems, however, adopt a simple strategy, by randomly selecting an action among all the available actions. This paper proposes a novel exploration strategy, known as Knowledge-based Exploration, for guiding the exploration of a family of self-organizing neural networks in reinforcement learning. Specifically, exploration is directed towards unexplored and favorable action choices while steering away from those negative action choices that are likely to fail. This is achieved by using the learned knowledge of the agent to identify prior action choices leading to …