Open Access. Powered by Scholars. Published by Universities.®

Operations Research, Systems Engineering and Industrial Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 31

Full-Text Articles in Operations Research, Systems Engineering and Industrial Engineering

Research And Development Of Simulation Training Platform For Multi-Agent Collaborative Decision-Making, Cheng Cheng, Zhijie Chen, Ziming Guo, Ni Li Dec 2023

Research And Development Of Simulation Training Platform For Multi-Agent Collaborative Decision-Making, Cheng Cheng, Zhijie Chen, Ziming Guo, Ni Li

Journal of System Simulation

Abstract: Reinforcement learning simulation platform can be an interactive and training environment for reinforcement learning. In order to make the simulation platform compatible with the multi-agent reinforcement learning algorithms and meet the needs of simulation in military field, the similar processes in multi-agent reinforcement learning algorithms are refined and a unified interface is designed to embed and verify different types of deep reinforcement learning algorithms on the simulation platform and to optimize the back-end service of the simulation platform to accelerate the training process of the algorithm model. The experimental results show that, by unifing the interface, the simulation platform …


Neural Airport Ground Handling, Yaoxin Wu, Jianan Zhou, Yunwen Xia, Xianli Zhang, Zhiguang Cao, Jie Zhang Dec 2023

Neural Airport Ground Handling, Yaoxin Wu, Jianan Zhou, Yunwen Xia, Xianli Zhang, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

Airport ground handling (AGH) offers necessary operations to flights during their turnarounds and is of great importance to the efficiency of airport management and the economics of aviation. Such a problem involves the interplay among the operations that leads to NP-hard problems with complex constraints. Hence, existing methods for AGH are usually designed with massive domain knowledge but still fail to yield high-quality solutions efficiently. In this paper, we aim to enhance the solution quality and computation efficiency for solving AGH. Particularly, we first model AGH as a multiple-fleet vehicle routing problem (VRP) with miscellaneous constraints including precedence, time windows, …


Intercell Dynamic Scheduling Method Based On Deep Reinforcement Learning, Jing Ni, Mengke Ma Nov 2023

Intercell Dynamic Scheduling Method Based On Deep Reinforcement Learning, Jing Ni, Mengke Ma

Journal of System Simulation

Abstract: In order to solve the intercell scheduling problem of dynamic arrival of machining tasks and realize adaptive scheduling in the complex and changeable environment of the intelligent factory, a scheduling method based on a deep Q network is proposed. A complex network with cells as nodes and workpiece intercell machining path as directed edges is constructed, and the degree value is introduced to define the state space with intercell scheduling characteristics. A compound scheduling rule composed of a workpiece layer, unit layer, and machine layer is designed, and hierarchical optimization makes the scheduling scheme more global. Since double deep …


Uav-Enabled Task Offloading Strategy For Vehicular Edge Computing Networks, Feng Hu, Haiyang Gu, Jun Lin Nov 2023

Uav-Enabled Task Offloading Strategy For Vehicular Edge Computing Networks, Feng Hu, Haiyang Gu, Jun Lin

Journal of System Simulation

Abstract: As intelligent vehicles are equipped with more and more sensors, the explosive growth of sensor data is generated, which brings severe challenges to vehicular communication and computing. In addition, the modern road presents a three-dimensional structure, and the system architecture of traditional vehicular networks cannot guarantee full coverage and seamless computing. A task offloading strategy for UAV-assisted and 6G-enabled (Sixth Generation) vehicular edge computing networks is proposed. Furthermore, a flexible and intelligent vehicular edge computing mode is composed by vehicles and UAVs, which provide three-dimensional edge computing services for delay-sensitive and computation-intensive vehicular tasks, and ensure timely processing and …


Imitative Generation Of Optimal Guidance Law Based On Reinforcement Learning, Zhengxuan Jia, Tingyu Lin, Yingying Xiao, Guoqiang Shi, Hao Wang, Bi Zeng, Yiming Ou, Pengpeng Zhao Nov 2023

Imitative Generation Of Optimal Guidance Law Based On Reinforcement Learning, Zhengxuan Jia, Tingyu Lin, Yingying Xiao, Guoqiang Shi, Hao Wang, Bi Zeng, Yiming Ou, Pengpeng Zhao

Journal of System Simulation

Abstract: Under the background of high-speed maneuvering target interception, an optimal guidance law generation method for head-on interception independent of target acceleration estimation is proposed based on deep reinforcement learning. In addition, its effectiveness is verified through simulation experiments. As the simulation results suggest, the proposed method successfully achieves head-on interception of high-speed maneuvering targets in 3D space and largely reduces the requirement for target estimation with strong uncertainty, and it is more applicable than the optimal control method.


Aircraft Assignment Method For Optimal Utilization Of Maintenance Intervals, Runxia Guo, Yifu Wang Sep 2023

Aircraft Assignment Method For Optimal Utilization Of Maintenance Intervals, Runxia Guo, Yifu Wang

Journal of System Simulation

Abstract: The aircraft assignment problem is studied from a maintenance assurance perspective. In order to ensure its continuous airworthiness, civil aircraft are required to perform maintenance tasks, i. e., scheduled inspections, at specified intervals. The scheduled inspection interval is usually controlled by the number of flight cycles (FC), flight hours (FH), or flight days (FD), whichever comes first. In order to make balanced use of the inspection interval, an aircraft assignment model for a given fleet size is developed to optimize the maintenance interval utilization, and it is solved by a reinforcement learning algorithm to minimize the variance of the …


Imitation Improvement Learning For Large-Scale Capacitated Vehicle Routing Problems, The Viet Bui, Tien Mai Jul 2023

Imitation Improvement Learning For Large-Scale Capacitated Vehicle Routing Problems, The Viet Bui, Tien Mai

Research Collection School Of Computing and Information Systems

Recent works using deep reinforcement learning (RL) to solve routing problems such as the capacitated vehicle routing problem (CVRP) have focused on improvement learning-based methods, which involve improving a given solution until it becomes near-optimal. Although adequate solutions can be achieved for small problem instances, their efficiency degrades for large-scale ones. In this work, we propose a newimprovement learning-based framework based on imitation learning where classical heuristics serve as experts to encourage the policy model to mimic and produce similar or better solutions. Moreover, to improve scalability, we propose Clockwise Clustering, a novel augmented framework for decomposing large-scale CVRP into …


Research On Unmanned Swarm Combat System Adaptive Evolution Model Simulation, Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma Apr 2023

Research On Unmanned Swarm Combat System Adaptive Evolution Model Simulation, Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma

Journal of System Simulation

Abstract: Aiming at the fact that the intelligent unmanned swarm combat system is mainly composed of large-scale combat individuals with limited behavioral capabilities and has limited ability to adapt to the changes of battlefield environment and combat opponents, a learning evolution method combining genetic algorithm and reinforcement learning is proposed to construct an individual-based unmanned bee colony combat system evolution model. To improve the adaptive evolution efficiency of bee colony combat system, an improved genetic algorithm is proposed to improve the learning and evolution speed of bee colony individuals by using individual-specific mutation optimization strategy. Simulation experiment on …


Multi-Agent Cooperative Combat Simulation In Naval Battlefield With Reinforcement Learning, Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei Apr 2023

Multi-Agent Cooperative Combat Simulation In Naval Battlefield With Reinforcement Learning, Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei

Journal of System Simulation

Abstract: Due to the rapidly-changed situations of future naval battlefields, it is urgent to realize the high-quality combat simulation in naval battlefields based on artificial intelligence to comprehensively optimize and improve the combat effectiveness of our army and defeat the enemy. The collaboration of combat units is the key point and how to realize the balanced decision-making among multiple agents is the first task. Based on decoupling priority experience replay mechanism and attention mechanism, a multi-agent reinforcement learning-based cooperative combat simulation (MARL-CCSA) network is proposed. Based on the expert experience, a multi-scale reward function is designed, on which a naval …


Dqn-Based Joint Scheduling Method Of Heterogeneous Tt&C Resources, Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu Feb 2023

Dqn-Based Joint Scheduling Method Of Heterogeneous Tt&C Resources, Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu

Journal of System Simulation

Abstract: Joint scheduling of heterogeneous TT&C resources as research object, a deep Q network (DQN) algorithm based on reinforcement learning is proposed. The characteristics of the joint scheduling problem of heterogeneous TT&C resources being fully analyzied and mathematical language being used to describe the constraints affecting the solution, a resource joint scheduling model is established. From the perspective of applying reinforcement learning, two neural networks with the same structure and the action selection strategies based onεgreedy algorithm are respectively designed after Markov decision process description, and DQN solution framework is established. The simulation results show that DQN-based heterogeneous …


Reinforcement-Learning-Based Adaptive Tracking Control For A Space Continuum Robot Based On Reinforcement Learning, Da Jiang, Zhiqin Cai, Zhongzhen Liu, Haijun Peng, Zhigang Wu Oct 2022

Reinforcement-Learning-Based Adaptive Tracking Control For A Space Continuum Robot Based On Reinforcement Learning, Da Jiang, Zhiqin Cai, Zhongzhen Liu, Haijun Peng, Zhigang Wu

Journal of System Simulation

Abstract: Aiming at the tracking control for three-arm space continuum robot in space active debris removal manipulation, an adaptive sliding mode control algorithm based on deep reinforcement learning is proposed. Through BP network, a data-driven dynamic model is developed as the predictive model to guide the reinforcement learning to adjust the sliding mode controller's parameters online, and finally realize a real-time tracking control. Simulation results show that the proposed data-driven predictive model can accurately predict the robot's dynamic characteristics with the relative error within ±1% to random trajectories. Compared with the fixed-parameter sliding mode controller, the proposed adaptive controller …


Application Of Improved Q Learning Algorithm In Job Shop Scheduling Problem, Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian Jun 2022

Application Of Improved Q Learning Algorithm In Job Shop Scheduling Problem, Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian

Journal of System Simulation

Abstract: Aiming at the job shop scheduling in a dynamic environment, a dynamic scheduling algorithm based on an improved Q learning algorithm and dispatching rules is proposed. The state space of the dynamic scheduling algorithm is described with the concept of "the urgency of remaining tasks" and a reward function with the purpose of "the higher the slack, the higher the penalty" is disigned. In view of the problem that the greedy strategy will select the sub-optimal actions in the later stage of learning, the traditional Q learning algorithm is improved by introducing an action selection strategy based on the …


Research On The Construction Method Of Simulation Evaluation Index Of Operation Effectiveness Operation Concept Traction, Ziwei Zhang, Liang Li, Zhiming Dong, Yifei Wang, Li Duan Mar 2022

Research On The Construction Method Of Simulation Evaluation Index Of Operation Effectiveness Operation Concept Traction, Ziwei Zhang, Liang Li, Zhiming Dong, Yifei Wang, Li Duan

Journal of System Simulation

Abstract: Agents are difficult to be directly modeled and simulated due to the complexity of their own interaction and learning behaviors. Aiming at the common problems in the discrete simulation of the agent, the event transfer mechanism of the discrete event system specification (DEVS) atomic model is applied to express the interaction and learning of an agent. Through the interaction mode of the agent, the transfer control of multi-state external events, the port connection mode, as well as the introduction of reinforcement learning event transfer representation, a discrete simulation construction method of the agent based on the DEVS atomic model …


Team Air Combat Using Model-Based Reinforcement Learning, David A. Mottice Mar 2022

Team Air Combat Using Model-Based Reinforcement Learning, David A. Mottice

Theses and Dissertations

We formulate the first generalized air combat maneuvering problem (ACMP), called the MvN ACMP, wherein M friendly AUCAVs engage against N enemy AUCAVs, developing a Markov decision process (MDP) model to control the team of M Blue AUCAVs. The MDP model leverages a 5-degree-of-freedom aircraft state transition model and formulates a directed energy weapon capability. Instead, a model-based reinforcement learning approach is adopted wherein an approximate policy iteration algorithmic strategy is implemented to attain high-quality approximate policies relative to a high performing benchmark policy. The ADP algorithm utilizes a multi-layer neural network for the value function approximation regression mechanism. One-versus-one …


Research On Experimental Method Of Joint Operation Simulation Based On Human-Machine Hybrid Intelligence, Ma Jun, Jingyu Yang, Wu Xi Oct 2021

Research On Experimental Method Of Joint Operation Simulation Based On Human-Machine Hybrid Intelligence, Ma Jun, Jingyu Yang, Wu Xi

Journal of System Simulation

Abstract: In view of the difficulties that the joint operation simulation experiment methods are mainly for guiding equipment evaluation and demonstration, which is difficult to effectively support the research of operation problems, a joint operation simulation experiment method based on human-machine hybrid intelligence is proposed. The classification, generation and accumulation process of the knowledge in joint operation simulation experiment are clarified. Through the detailed descriptions of experimental interaction process, experimental operation process, experimental driving mode, simulation operation mode, supporting system structure, etc., a joint operation simulation experiment framework based on man-machine hybrid intelligence is constructed. It provides a new method …


Dqn-Based Path Planning Method And Simulation For Submarine And Warship In Naval Battlefield, Xiaodong Huang, Haitao Yuan, Bi Jing, Liu Tao Oct 2021

Dqn-Based Path Planning Method And Simulation For Submarine And Warship In Naval Battlefield, Xiaodong Huang, Haitao Yuan, Bi Jing, Liu Tao

Journal of System Simulation

Abstract: To realize multi-agent intelligent planning and target tracking in complex naval battlefield environment, the work focuses on agents (submarine or warship), and proposes a simulation method based on reinforcement learning algorithm called Deep Q Network (DQN). Two neural networks with the same structure and different parameters are designed to update real and predicted Q values for the convergence of value functions. An ε-greedy algorithm is proposed to design an action selection mechanism, and a reward function is designed for the naval battlefield environment to increase the update velocity and generalization ability of Learning with Experience Replay (LER). Simulation results …


Study On Next-Generation Strategic Wargame System, Wu Xi, Xianglin Meng, Jingyu Yang Sep 2021

Study On Next-Generation Strategic Wargame System, Wu Xi, Xianglin Meng, Jingyu Yang

Journal of System Simulation

Abstract: Strategic wargame is an important support to the strategic decision. The research status and challenges of the strategic wargame are analyzed, and the influence of big data and artificial intelligence technology on the strategic wargame system is studied. The prospects and key technologies of the next-generation strategic wargame system are studied, including the construction of event association graph for strategic topics, generation of strategic decision sparse samples based on generative adversarial nets, gaming strategy learning of human-in-loop hybrid enhancement, and public opinion dissemination modeling technology based on social network. The development trend of the strategic wargame is proposed.


Self-Learning-Based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition, Zhao Yu, Jifeng Guo, Yan Peng, Chengchao Bai Aug 2021

Self-Learning-Based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition, Zhao Yu, Jifeng Guo, Yan Peng, Chengchao Bai

Journal of System Simulation

Abstract: In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming at the low success rate of traditional procedural maneuver evasion, a multi-agent cooperative autonomous decision-making algorithm, which is based on deep reinforcement learning method, is proposed. Based on the actor-critic architecture, a multi-agent reinforcement learning algorithm is designed, in which a weighted linear fitting method is proposed to solve the reliability allocation problem of the self-learning system. To solve the sparse reward problem in task scenario, a sparse reward reinforcement learning method based on inverse value method is proposed. According to the task scenario, …


High-Density Parking For Autonomous Vehicles., Parag J. Siddique Aug 2021

High-Density Parking For Autonomous Vehicles., Parag J. Siddique

Electronic Theses and Dissertations

In a common parking lot, much of the space is devoted to lanes. Lanes must not be blocked for one simple reason: a blocked car might need to leave before the car that blocks it. However, the advent of autonomous vehicles gives us an opportunity to overcome this constraint, and to achieve a higher storage capacity of cars. Taking advantage of self-parking and intelligent communication systems of autonomous vehicles, we propose puzzle-based parking, a high-density design for a parking lot. We introduce a novel method of vehicle parking, which leads to maximum parking density. We then propose a heuristic method …


Step-Wise Deep Learning Models For Solving Routing Problems, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang Jul 2021

Step-Wise Deep Learning Models For Solving Routing Problems, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

Routing problems are very important in intelligent transportation systems. Recently, a number of deep learning-based methods are proposed to automatically learn construction heuristics for solving routing problems. However, these methods do not completely follow Bellman's Principle of Optimality since the visited nodes during construction are still included in the following subtasks, resulting in suboptimal policies. In this article, we propose a novel step-wise scheme which explicitly removes the visited nodes in each node selection step. We apply this scheme to two representative deep models for routing problems, pointer network and transformer attention model (TAM), and significantly improve the performance of …


Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau May 2021

Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

We address the problem ofmultiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains - air-traffic control and cooperative navigation, shows better solution quality than previous approaches.


Deep Reinforcement Learning Approach To Solve Dynamic Vehicle Routing Problem With Stochastic Customers, Waldy Joe, Hoong Chuin Lau Oct 2020

Deep Reinforcement Learning Approach To Solve Dynamic Vehicle Routing Problem With Stochastic Customers, Waldy Joe, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

In real-world urban logistics operations, changes to the routes and tasks occur in response to dynamic events. To ensure customers’ demands are met, planners need to make these changes quickly (sometimes instantaneously). This paper proposes the formulation of a dynamic vehicle routing problem with time windows and both known and stochastic customers as a route-based Markov Decision Process. We propose a solution approach that combines Deep Reinforcement Learning (specifically neural networks-based TemporalDifference learning with experience replay) to approximate the value function and a routing heuristic based on Simulated Annealing, called DRLSA. Our approach enables optimized re-routing decision to be generated …


Joint Optimization Control Of Energy Storage System Management And Demand Response, Xueying Gao, Tang Hao, Gangzhong Miao, Zhaowu Ping Jul 2020

Joint Optimization Control Of Energy Storage System Management And Demand Response, Xueying Gao, Tang Hao, Gangzhong Miao, Zhaowu Ping

Journal of System Simulation

Abstract: The joint optimization problem of energy management and demand response were studied in order to reduce the long-run cost of electricity users equipped with energy storage unit and smart applications, and to increase their benefits meanwhile. The goals were achieved by controlling both the energy storage unit (charging, discharging, or idle) and the load service (access or delay). Based on the random nature of solar photovoltaic, load demand electricity and electricity price, the joint optimization problem was modeled as infinite-horizon Markov decision process model, and Q-learning algorithm was proposed to find the optimal solution. Simulation results show that the …


Analysis And Optimization Of The Action Chain Mechanism In Agent2d Underlying In Robocup2d Soccer League, Chen Bing, Feifan Xu, Hanyan Xu, Zekai Cheng, Liu Cheng Jun 2020

Analysis And Optimization Of The Action Chain Mechanism In Agent2d Underlying In Robocup2d Soccer League, Chen Bing, Feifan Xu, Hanyan Xu, Zekai Cheng, Liu Cheng

Journal of System Simulation

Abstract: In the RoboCup2D soccer league, Agent2D is one of the most widely used underlying team in China. Data transmission noise and the incomplete action chain mechanism make the underlying teams using Agent2D be lack of flexibility. This paper introduces an action correcting parameter and optimizes the operation of the action chain by reinforcement learning mechanism. The performance of the Agent2D underlying team is improved in the game and the adaptability of the team is enhanced. Simulation experiment results show that this method has a certain effect.


Hierarchical Multiagent Reinforcement Learning For Maritime Traffic Management, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau May 2020

Hierarchical Multiagent Reinforcement Learning For Maritime Traffic Management, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

Increasing global maritime traffic coupled with rapid digitization and automation in shipping mandate developing next generation maritime traffic management systems to mitigate congestion, increase safety of navigation, and avoid collisions in busy and geographically constrained ports (such as Singapore's). To achieve these objectives, we model the maritime traffic as a large multiagent system with individual vessels as agents, and VTS (Vessel Traffic Service) authority as a regulatory agent. We develop a hierarchical reinforcement learning approach where vessels first select a high level action based on the underlying traffic flow, and then select the low level action that determines their future …


Robot Arm Control Method Based On Deep Reinforcement Learning, Heyu Li, Zhilong Zhao, Gu Lei, Liqin Guo, Zeng Bi, Tingyu Lin Dec 2019

Robot Arm Control Method Based On Deep Reinforcement Learning, Heyu Li, Zhilong Zhao, Gu Lei, Liqin Guo, Zeng Bi, Tingyu Lin

Journal of System Simulation

Abstract: Deep reinforcement learning continues to explore in the environment and adjusts the neural network parameters by the reward function. The actual production line can not be used as the trial and error environment for the algorithm, so there is not enough data. For that, this paper constructs a virtual robot arm simulation environment, including the robot arm and the object. The Deep Deterministic Policy Gradient (DDPG),in which the state variables and reward function are set,is trained by deep reinforcement learning algorithm in the simulation environment to realize the target of controlling the robot arm to move the gripper below …


Dp-Q(Λ): Real-Time Path Planning For Multi-Agent In Large-Scale Web3d Scene, Fengting Yan, Jinyuan Jia Apr 2019

Dp-Q(Λ): Real-Time Path Planning For Multi-Agent In Large-Scale Web3d Scene, Fengting Yan, Jinyuan Jia

Journal of System Simulation

Abstract: The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems, the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. If the next position is free, the agent could walk to it. The above strategy …


Adopt: Combining Parameter Tuning And Adaptive Operator Ordering For Solving A Class Of Orienteering Problems, Aldy Gunawan, Hoong Chuin Lau, Kun Lu Jul 2018

Adopt: Combining Parameter Tuning And Adaptive Operator Ordering For Solving A Class Of Orienteering Problems, Aldy Gunawan, Hoong Chuin Lau, Kun Lu

Research Collection School Of Computing and Information Systems

Two fundamental challenges in local search based metaheuristics are how to determine parameter configurations and design the underlying Local Search (LS) procedure. In this paper, we propose a framework in order to handle both challenges, called ADaptive OPeraTor Ordering (ADOPT). In this paper, The ADOPT framework is applied to two metaheuristics, namely Iterated Local Search (ILS) and a hybridization of Simulated Annealing and ILS (SAILS) for solving two variants of the Orienteering Problem: the Team Dependent Orienteering Problem (TDOP) and the Team Orienteering Problem with Time Windows (TOPTW). This framework consists of two main processes. The Design of Experiment (DOE) …


An Efficient Approach To Model-Based Hierarchical Reinforcement Learning, Zhuoru Li, Akshay Narayan, Tze-Yun Leong Feb 2017

An Efficient Approach To Model-Based Hierarchical Reinforcement Learning, Zhuoru Li, Akshay Narayan, Tze-Yun Leong

Research Collection School Of Computing and Information Systems

We propose a model-based approach to hierarchical reinforcement learning that exploits shared knowledge and selective execution at different levels of abstraction, to efficiently solve large, complex problems. Our framework adopts a new transition dynamics learning algorithm that identifies the common action-feature combinations of the subtasks, and evaluates the subtask execution choices through simulation. The framework is sample efficient, and tolerates uncertain and incomplete problem characterization of the subtasks. We test the framework on common benchmark problems and complex simulated robotic environments. It compares favorably against the stateof-the-art algorithms, and scales well in very large problems.


Reinforcement Learning Framework For Modeling Spatial Sequential Decisions Under Uncertainty: (Extended Abstract), Truc Viet Le, Siyuan Liu, Hoong Chuin Lau May 2016

Reinforcement Learning Framework For Modeling Spatial Sequential Decisions Under Uncertainty: (Extended Abstract), Truc Viet Le, Siyuan Liu, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

We consider the problem of trajectory prediction, where a trajectory is an ordered sequence of location visits and corresponding timestamps. The problem arises when an agent makes sequential decisions to visit a set of spatial locations of interest. Each location bears a stochastic utility and the agent has a limited budget to spend. Given the agent's observed partial trajectory, our goal is to predict the remaining trajectory. We propose a solution framework to the problem considering both the uncertainty of utility and the budget constraint. We use reinforcement learning (RL) to model the underlying decision processes and inverse RL to …