Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 51

Full-Text Articles in Physical Sciences and Mathematics

Neural Airport Ground Handling, Yaoxin Wu, Jianan Zhou, Yunwen Xia, Xianli Zhang, Zhiguang Cao, Jie Zhang Dec 2023

Neural Airport Ground Handling, Yaoxin Wu, Jianan Zhou, Yunwen Xia, Xianli Zhang, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

Airport ground handling (AGH) offers necessary operations to flights during their turnarounds and is of great importance to the efficiency of airport management and the economics of aviation. Such a problem involves the interplay among the operations that leads to NP-hard problems with complex constraints. Hence, existing methods for AGH are usually designed with massive domain knowledge but still fail to yield high-quality solutions efficiently. In this paper, we aim to enhance the solution quality and computation efficiency for solving AGH. Particularly, we first model AGH as a multiple-fleet vehicle routing problem (VRP) with miscellaneous constraints including precedence, time windows, …


Decentralized Multimedia Data Sharing In Iov: A Learning-Based Equilibrium Of Supply And Demand, Jiani Fan, Minrui Xu, Jiale Guo, Lwin Khin Shar, Jiawen Kang, Dusit Niyato, Kwok-Yan Lam Oct 2023

Decentralized Multimedia Data Sharing In Iov: A Learning-Based Equilibrium Of Supply And Demand, Jiani Fan, Minrui Xu, Jiale Guo, Lwin Khin Shar, Jiawen Kang, Dusit Niyato, Kwok-Yan Lam

Research Collection School Of Computing and Information Systems

The Internet of Vehicles (IoV) has great potential to transform transportation systems by enhancing road safety, reducing traffic congestion, and improving user experience through onboard infotainment applications. Decentralized data sharing can improve security, privacy, reliability, and facilitate infotainment data sharing in IoVs. However, decentralized data sharing may not achieve the expected efficiency if there are IoV users who only want to consume the shared data but are not willing to contribute their own data to the community, resulting in incomplete information observed by other vehicles and infrastructure, which can introduce additional transmission latency. Therefore, in this paper, by modeling the …


Transferable Curricula Through Difficulty Conditioned Generators, Sidney Tio, Pradeep Varakantham Aug 2023

Transferable Curricula Through Difficulty Conditioned Generators, Sidney Tio, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial "Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents …


Reinforcement Learning For Sequential Decision Making With Constraints, Jiajing Ling Jul 2023

Reinforcement Learning For Sequential Decision Making With Constraints, Jiajing Ling

Dissertations and Theses Collection (Open Access)

Reinforcement learning is a widely used approach to tackle problems in sequential decision making where an agent learns from rewards or penalties. However, in decision-making problems that involve safety or limited resources, the agent's exploration is often limited by constraints. To model such problems, constrained Markov decision processes and constrained decentralized partially observable Markov decision processes have been proposed for single-agent and multi-agent settings, respectively. A significant challenge in solving constrained Dec-POMDP is determining the contribution of each agent to the primary objective and constraint violations. To address this issue, we propose a fictitious play-based method that uses Lagrangian Relaxation …


Imitation Improvement Learning For Large-Scale Capacitated Vehicle Routing Problems, The Viet Bui, Tien Mai Jul 2023

Imitation Improvement Learning For Large-Scale Capacitated Vehicle Routing Problems, The Viet Bui, Tien Mai

Research Collection School Of Computing and Information Systems

Recent works using deep reinforcement learning (RL) to solve routing problems such as the capacitated vehicle routing problem (CVRP) have focused on improvement learning-based methods, which involve improving a given solution until it becomes near-optimal. Although adequate solutions can be achieved for small problem instances, their efficiency degrades for large-scale ones. In this work, we propose a newimprovement learning-based framework based on imitation learning where classical heuristics serve as experts to encourage the policy model to mimic and produce similar or better solutions. Moreover, to improve scalability, we propose Clockwise Clustering, a novel augmented framework for decomposing large-scale CVRP into …


Multi-View Hypergraph Contrastive Policy Learning For Conversational Recommendation, Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai: Yang Zhu, Zujie Wen, Dangyang Chen, Feida Zhu, Feida Zhu Jul 2023

Multi-View Hypergraph Contrastive Policy Learning For Conversational Recommendation, Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai: Yang Zhu, Zujie Wen, Dangyang Chen, Feida Zhu, Feida Zhu

Research Collection School Of Computing and Information Systems

Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user preferences with pairwise relations from the interactive conversation and item knowledge, while largely ignoring the fact that factors for a relationship in CRS are multiplex. Specifically, the user likes/dislikes the items that satisfy some attributes (Like/Dislike view). Moreover social influence is another important factor that affects user preference towards the item (Social view), while is largely ignored by previous works in CRS. The user preferences from these …


Dynamic Police Patrol Scheduling With Multi-Agent Reinforcement Learning, Songhan Wong, Waldy Joe, Hoong Chuin Lau Jun 2023

Dynamic Police Patrol Scheduling With Multi-Agent Reinforcement Learning, Songhan Wong, Waldy Joe, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

Effective police patrol scheduling is essential in projecting police presence and ensuring readiness in responding to unexpected events in urban environments. However, scheduling patrols can be a challenging task as it requires balancing between two conflicting objectives namely projecting presence (proactive patrol) and incident response (reactive patrol). This task is made even more challenging with the fact that patrol schedules do not remain static as occurrences of dynamic incidents can disrupt the existing schedules. In this paper, we propose a solution to this problem using Multi-Agent Reinforcement Learning (MARL) to address the Dynamic Bi-objective Police Patrol Dispatching and Rescheduling Problem …


Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li May 2023

Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li

Research Collection School Of Computing and Information Systems

Domain adaptation enables generalized learning in new environments by transferring knowledge from label-rich source domains to label-scarce target domains. As a more realistic extension, partial domain adaptation (PDA) relaxes the assumption of fully shared label space, and instead deals with the scenario where the target label space is a subset of the source label space. In this paper, we propose a Reinforced Adaptation Network (RAN) to address the challenging PDA problem. Specifically, a deep reinforcement learning model is proposed to learn source data selection policies. Meanwhile, a domain adaptation model is presented to simultaneously determine rewards and learn domain-invariant feature …


A Review On Derivative Hedging Using Reinforcement Learning, Peng Liu Mar 2023

A Review On Derivative Hedging Using Reinforcement Learning, Peng Liu

Research Collection Lee Kong Chian School Of Business

Hedging is a common trading activity to manage the risk of engaging in transactions that involve derivatives such as options. Perfect and timely hedging, however, is an impossible task in the real market that characterizes discrete-time transactions with costs. Recent years have witnessed reinforcement learning (RL) in formulating optimal hedging strategies. Specifically, different RL algorithms have been applied to learn the optimal offsetting position based on market conditions, offering an automatic risk management solution that proposes optimal hedging strategies while catering to both market dynamics and restrictions. In this article, the author provides a comprehensive review of the use of …


Constrained Reinforcement Learning In Hard Exploration Problems, Pankayaraj Pathmanathan, Pradeep Varakantham Feb 2023

Constrained Reinforcement Learning In Hard Exploration Problems, Pankayaraj Pathmanathan, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

One approach to guaranteeing safety in Reinforcement Learning is through cost constraints that are imposed on trajectories. Recent works in constrained RL have developed methods that ensure constraints can be enforced even at learning time while maximizing the overall value of the policy. Unfortunately, as demonstrated in our experimental results, such approaches do not perform well on complex multi-level tasks, with longer episode lengths or sparse rewards. To that end, wepropose a scalable hierarchical approach for constrained RL problems that employs backward cost value functions in the context of task hierarchy and a novel intrinsic reward function in lower levels …


Reinforcement Learning Enhanced Pichunter For Interactive Search, Zhixin Ma, Jiaxin Wu, Weixiong Loo, Chong-Wah Ngo Jan 2023

Reinforcement Learning Enhanced Pichunter For Interactive Search, Zhixin Ma, Jiaxin Wu, Weixiong Loo, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

With the tremendous increase in video data size, search performance could be impacted significantly. Specifically, in an interactive system, a real-time system allows a user to browse, search and refine a query. Without a speedy system quickly, the main ingredient to engage a user to stay focused, an interactive system becomes less effective even with a sophisticated deep learning system. This paper addresses this challenge by leveraging approximate search, Bayesian inference, and reinforcement learning. For approximate search, we apply a hierarchical navigable small world, which is an efficient approximate nearest neighbor search algorithm. To quickly prune the search scope, we …


Intelligent Adaptive Gossip-Based Broadcast Protocol For Uav-Mec Using Multi-Agent Deep Reinforcement Learning, Zen Ren, Xinghua Li, Yinbin Miao, Zhuowen Li, Zihao Wang, Mengyao Zhu, Ximeng Liu, Deng, Robert H. Jan 2023

Intelligent Adaptive Gossip-Based Broadcast Protocol For Uav-Mec Using Multi-Agent Deep Reinforcement Learning, Zen Ren, Xinghua Li, Yinbin Miao, Zhuowen Li, Zihao Wang, Mengyao Zhu, Ximeng Liu, Deng, Robert H.

Research Collection School Of Computing and Information Systems

UAV-assisted mobile edge computing (UAV-MEC) has been proposed to offer computing resources for smart devices and user equipment. UAV cluster aided MEC rather than one UAV-aided MEC as edge pool is the newest edge computing architecture. Unfortunately, the data packet exchange during edge computing within the UAV cluster hasn't received enough attention. UAVs need to collaborate for the wide implementation of MEC, relying on the gossip-based broadcast protocol. However, gossip has the problem of long propagation delay, where the forwarding probability and neighbors are two factors that are difficult to balance. The existing works improve gossip from only one factor, …


Learning Feature Embedding Refiner For Solving Vehicle Routing Problems, Jingwen Li, Yining Ma, Zhiguang Cao, Yaoxin Wu, Wen Song, Jie Zhang, Yeow Meng Chee Jan 2023

Learning Feature Embedding Refiner For Solving Vehicle Routing Problems, Jingwen Li, Yining Ma, Zhiguang Cao, Yaoxin Wu, Wen Song, Jie Zhang, Yeow Meng Chee

Research Collection School Of Computing and Information Systems

While the encoder–decoder structure is widely used in the recent neural construction methods for learning to solve vehicle routing problems (VRPs), they are less effective in searching solutions due to deterministic feature embeddings and deterministic probability distributions. In this article, we propose the feature embedding refiner (FER) with a novel and generic encoder–refiner–decoder structure to boost the existing encoder–decoder structured deep models. It is model-agnostic that the encoder and the decoder can be from any pretrained neural construction method. Regarding the introduced refiner network, we design its architecture by combining the standard gated recurrent units (GRU) cell with two new …


End-To-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek Dec 2022

End-To-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek

Research Collection School Of Computing and Information Systems

Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated …


Interactive Video Corpus Moment Retrieval Using Reinforcement Learning, Zhixin Ma, Chong-Wah Ngo Oct 2022

Interactive Video Corpus Moment Retrieval Using Reinforcement Learning, Zhixin Ma, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Known-item video search is effective with human-in-the-loop to interactively investigate the search result and refine the initial query. Nevertheless, when the first few pages of results are swamped with visually similar items, or the search target is hidden deep in the ranked list, finding the know-item target usually requires a long duration of browsing and result inspection. This paper tackles the problem by reinforcement learning, aiming to reach a search target within a few rounds of interaction by long-term learning from user feedbacks. Specifically, the system interactively plans for navigation path based on feedback and recommends a potential target that …


Reinforcement Learning-Based Interactive Video Search, Zhixin Ma, Jiaxin Wu, Zhijian Hou, Chong-Wah Ngo Jun 2022

Reinforcement Learning-Based Interactive Video Search, Zhixin Ma, Jiaxin Wu, Zhijian Hou, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Despite the rapid progress in text-to-video search due to the advancement of cross-modal representation learning, the existing techniques still fall short in helping users to rapidly identify the search targets. Particularly, in the situation that a system suggests a long list of similar candidates, the user needs to painstakingly inspect every search result. The experience is frustrated with repeated watching of similar clips, and more frustratingly, the search targets may be overlooked due to mental tiredness. This paper explores reinforcement learning-based (RL) searching to relieve the user from the burden of brute force inspection. Specifically, the system maintains a graph …


Heterogeneous Attentions For Solving Pickup And Delivery Problem Via Deep Reinforcement Learning, Jingwen Li, Liang Xin, Zhiguang Cao, Andrew Lim, Wen Song, Jie Zhang Mar 2022

Heterogeneous Attentions For Solving Pickup And Delivery Problem Via Deep Reinforcement Learning, Jingwen Li, Liang Xin, Zhiguang Cao, Andrew Lim, Wen Song, Jie Zhang

Research Collection School Of Computing and Information Systems

Recently, there is an emerging trend to apply deep reinforcement learning to solve the vehicle routing problem (VRP), where a learnt policy governs the selection of next node for visiting. However, existing methods could not handle well the pairing and precedence relationships in the pickup and delivery problem (PDP), which is a representative variant of VRP. To address this challenging issue, we leverage a novel neural network integrated with a heterogeneous attention mechanism to empower the policy in deep reinforcement learning to automatically select the nodes. In particular, the heterogeneous attention mechanism specifically prescribes attentions for each role of the …


Hierarchical Control Of Multi-Agent Reinforcement Learning Team In Real-Time Strategy (Rts) Games, Weigui Jair Zhou, Budhitama Subagdja, Ah-Hwee Tan, Darren Wee Sze Ong Dec 2021

Hierarchical Control Of Multi-Agent Reinforcement Learning Team In Real-Time Strategy (Rts) Games, Weigui Jair Zhou, Budhitama Subagdja, Ah-Hwee Tan, Darren Wee Sze Ong

Research Collection School Of Computing and Information Systems

Coordinated control of multi-agent teams is an important task in many real-time strategy (RTS) games. In most prior work, micromanagement is the commonly used strategy whereby individual agents operate independently and make their own combat decisions. On the other extreme, some employ a macromanagement strategy whereby all agents are controlled by a single decision model. In this paper, we propose a hierarchical command and control architecture, consisting of a single high-level and multiple low-level reinforcement learning agents operating in a dynamic environment. This hierarchical model enables the low-level unit agents to make individual decisions while taking commands from the high-level …


Learning To Assign: Towards Fair Task Assignment In Large-Scale Ride Hailing, Dingyuan Shi, Yongxin Tong, Zimu Zhou, Bingchen Song, Weifeng Lv, Qiang Yang Aug 2021

Learning To Assign: Towards Fair Task Assignment In Large-Scale Ride Hailing, Dingyuan Shi, Yongxin Tong, Zimu Zhou, Bingchen Song, Weifeng Lv, Qiang Yang

Research Collection School Of Computing and Information Systems

Ride hailing is a widespread shared mobility application where the central issue is to assign taxi requests to drivers with various objectives. Despite extensive research on task assignment in ride hailing, the fairness of earnings among drivers is largely neglected. Pioneer studies on fair task assignment in ride hailing are ineffective and inefficient due to their myopic optimization perspective and timeconsuming assignment techniques. In this work, we propose LAF, an effective and efficient task assignment scheme that optimizes both utility and fairness. We adopt reinforcement learning to make assignments in a holistic manner and propose a set of acceleration techniques …


Step-Wise Deep Learning Models For Solving Routing Problems, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang Jul 2021

Step-Wise Deep Learning Models For Solving Routing Problems, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

Routing problems are very important in intelligent transportation systems. Recently, a number of deep learning-based methods are proposed to automatically learn construction heuristics for solving routing problems. However, these methods do not completely follow Bellman's Principle of Optimality since the visited nodes during construction are still included in the following subtasks, resulting in suboptimal policies. In this article, we propose a novel step-wise scheme which explicitly removes the visited nodes in each node selection step. We apply this scheme to two representative deep models for routing problems, pointer network and transformer attention model (TAM), and significantly improve the performance of …


Learning Index Policies For Restless Bandits With Application To Maternal Healthcare, Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe May 2021

Learning Index Policies For Restless Bandits With Application To Maternal Healthcare, Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe

Research Collection School Of Computing and Information Systems

In many community health settings, it is crucial to have a systematic monitoring and intervention process to ensure that the patients adhere to healthcare programs, such as periodic health checks or taking medications. When these interventions are expensive, they can be provided to only a fixed small fraction of the patients at any period of time. Hence, it is important to carefully choose the beneficiaries who should be provided with interventions and when. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention …


Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau May 2021

Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

We address the problem ofmultiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains - air-traffic control and cooperative navigation, shows better solution quality than previous approaches.


Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar May 2021

Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar

Research Collection School Of Computing and Information Systems

We address the problem of multiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains—air-traffic control and cooperative navigation, shows better solution quality than previous approaches.


Deep Reinforcement Learning Approach To Solve Dynamic Vehicle Routing Problem With Stochastic Customers, Waldy Joe, Hoong Chuin Lau Oct 2020

Deep Reinforcement Learning Approach To Solve Dynamic Vehicle Routing Problem With Stochastic Customers, Waldy Joe, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

In real-world urban logistics operations, changes to the routes and tasks occur in response to dynamic events. To ensure customers’ demands are met, planners need to make these changes quickly (sometimes instantaneously). This paper proposes the formulation of a dynamic vehicle routing problem with time windows and both known and stochastic customers as a route-based Markov Decision Process. We propose a solution approach that combines Deep Reinforcement Learning (specifically neural networks-based TemporalDifference learning with experience replay) to approximate the value function and a routing heuristic based on Simulated Annealing, called DRLSA. Our approach enables optimized re-routing decision to be generated …


Reinforcement Learning For Zone Based Multiagent Pathfinding Under Uncertainty, Jiajing Ling, Tarun Gupta, Akshat Kumar Oct 2020

Reinforcement Learning For Zone Based Multiagent Pathfinding Under Uncertainty, Jiajing Ling, Tarun Gupta, Akshat Kumar

Research Collection School Of Computing and Information Systems

We address the problem of multiple agents finding their paths from respective sources to destination nodes in a graph (also called MAPF). Most existing approaches assume that all agents move at fixed speed, and that a single node accommodates only a single agent. Motivated by the emerging applications of autonomous vehicles such as drone traffic management, we present zone-based path finding (or ZBPF) where agents move among zones, and agents' movements require uncertain travel time. Furthermore, each zone can accommodate multiple agents (as per its capacity). We also develop a simulator for ZBPF which provides a clean interface from the …


Hierarchical Multiagent Reinforcement Learning For Maritime Traffic Management, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau May 2020

Hierarchical Multiagent Reinforcement Learning For Maritime Traffic Management, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

Increasing global maritime traffic coupled with rapid digitization and automation in shipping mandate developing next generation maritime traffic management systems to mitigate congestion, increase safety of navigation, and avoid collisions in busy and geographically constrained ports (such as Singapore's). To achieve these objectives, we model the maritime traffic as a large multiagent system with individual vessels as agents, and VTS (Vessel Traffic Service) authority as a regulatory agent. We develop a hierarchical reinforcement learning approach where vessels first select a high level action based on the underlying traffic flow, and then select the low level action that determines their future …


Using Reinforcement Learning To Minimize The Probability Of Delay Occurrence In Transportation, Zhiguang Cao, Hongliang Guo, Wen Song, Kaizhou Gao, Zhengghua Chen, Le Zhang, Xuexi Zhang Mar 2020

Using Reinforcement Learning To Minimize The Probability Of Delay Occurrence In Transportation, Zhiguang Cao, Hongliang Guo, Wen Song, Kaizhou Gao, Zhengghua Chen, Le Zhang, Xuexi Zhang

Research Collection School Of Computing and Information Systems

Reducing traffic delay is of crucial importance for the development of sustainable transportation systems, which is a challenging task in the studies of stochastic shortest path (SSP) problem. Existing methods based on the probability tail model to solve the SSP problem, seek for the path that minimizes the probability of delay occurrence, which is equal to maximizing the probability of reaching the destination before a deadline (i.e., arriving on time). However, they suffer from low accuracy or high computational cost. Therefore, we design a novel and practical Q-learning approach where the converged Q-values have the practical meaning as the actual …


Multi-Agent Collaborative Exploration Through Graph-Based Deep Reinforcement Learning, Tianze Luo, Budhitama Subagdja, Ah-Hwee Tan, Ah-Hwee Tan Oct 2019

Multi-Agent Collaborative Exploration Through Graph-Based Deep Reinforcement Learning, Tianze Luo, Budhitama Subagdja, Ah-Hwee Tan, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Autonomous exploration by a single or multiple agents in an unknown environment leads to various applications in automation, such as cleaning, search and rescue, etc. Traditional methods normally take frontier locations and segmented regions of the environment into account to efficiently allocate target locations to different agents to visit. They may employ ad hoc solutions to allocate the task to the agents, but the allocation may not be efficient. In the literature, few studies focused on enhancing the traditional methods by applying machine learning models for agent performance improvement. In this paper, we propose a graph-based deep reinforcement learning approach …


Probabilistic Guided Exploration For Reinforcement Learning In Self-Organizing Neural Networks, Peng Wang, Weigui Jair Zhou, Di Wang, Ah-Hwee Tan Jul 2018

Probabilistic Guided Exploration For Reinforcement Learning In Self-Organizing Neural Networks, Peng Wang, Weigui Jair Zhou, Di Wang, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Exploration is essential in reinforcement learning, which expands the search space of potential solutions to a given problem for performance evaluations. Specifically, carefully designed exploration strategy may help the agent learn faster by taking the advantage of what it has learned previously. However, many reinforcement learning mechanisms still adopt simple exploration strategies, which select actions in a pure random manner among all the feasible actions. In this paper, we propose novel mechanisms to improve the existing knowledgebased exploration strategy based on a probabilistic guided approach to select actions. We conduct extensive experiments in a Minefield navigation simulator and the results …


Adopt: Combining Parameter Tuning And Adaptive Operator Ordering For Solving A Class Of Orienteering Problems, Aldy Gunawan, Hoong Chuin Lau, Kun Lu Jul 2018

Adopt: Combining Parameter Tuning And Adaptive Operator Ordering For Solving A Class Of Orienteering Problems, Aldy Gunawan, Hoong Chuin Lau, Kun Lu

Research Collection School Of Computing and Information Systems

Two fundamental challenges in local search based metaheuristics are how to determine parameter configurations and design the underlying Local Search (LS) procedure. In this paper, we propose a framework in order to handle both challenges, called ADaptive OPeraTor Ordering (ADOPT). In this paper, The ADOPT framework is applied to two metaheuristics, namely Iterated Local Search (ILS) and a hybridization of Simulated Annealing and ILS (SAILS) for solving two variants of the Orienteering Problem: the Team Dependent Orienteering Problem (TDOP) and the Team Orienteering Problem with Time Windows (TOPTW). This framework consists of two main processes. The Design of Experiment (DOE) …