Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Journal of System Simulation

Reinforcement learning

Articles 1 - 19 of 19

Full-Text Articles in Engineering

Research And Development Of Simulation Training Platform For Multi-Agent Collaborative Decision-Making, Cheng Cheng, Zhijie Chen, Ziming Guo, Ni Li Dec 2023

Research And Development Of Simulation Training Platform For Multi-Agent Collaborative Decision-Making, Cheng Cheng, Zhijie Chen, Ziming Guo, Ni Li

Journal of System Simulation

Abstract: Reinforcement learning simulation platform can be an interactive and training environment for reinforcement learning. In order to make the simulation platform compatible with the multi-agent reinforcement learning algorithms and meet the needs of simulation in military field, the similar processes in multi-agent reinforcement learning algorithms are refined and a unified interface is designed to embed and verify different types of deep reinforcement learning algorithms on the simulation platform and to optimize the back-end service of the simulation platform to accelerate the training process of the algorithm model. The experimental results show that, by unifing the interface, the simulation platform …


Intercell Dynamic Scheduling Method Based On Deep Reinforcement Learning, Jing Ni, Mengke Ma Nov 2023

Intercell Dynamic Scheduling Method Based On Deep Reinforcement Learning, Jing Ni, Mengke Ma

Journal of System Simulation

Abstract: In order to solve the intercell scheduling problem of dynamic arrival of machining tasks and realize adaptive scheduling in the complex and changeable environment of the intelligent factory, a scheduling method based on a deep Q network is proposed. A complex network with cells as nodes and workpiece intercell machining path as directed edges is constructed, and the degree value is introduced to define the state space with intercell scheduling characteristics. A compound scheduling rule composed of a workpiece layer, unit layer, and machine layer is designed, and hierarchical optimization makes the scheduling scheme more global. Since double deep …


Uav-Enabled Task Offloading Strategy For Vehicular Edge Computing Networks, Feng Hu, Haiyang Gu, Jun Lin Nov 2023

Uav-Enabled Task Offloading Strategy For Vehicular Edge Computing Networks, Feng Hu, Haiyang Gu, Jun Lin

Journal of System Simulation

Abstract: As intelligent vehicles are equipped with more and more sensors, the explosive growth of sensor data is generated, which brings severe challenges to vehicular communication and computing. In addition, the modern road presents a three-dimensional structure, and the system architecture of traditional vehicular networks cannot guarantee full coverage and seamless computing. A task offloading strategy for UAV-assisted and 6G-enabled (Sixth Generation) vehicular edge computing networks is proposed. Furthermore, a flexible and intelligent vehicular edge computing mode is composed by vehicles and UAVs, which provide three-dimensional edge computing services for delay-sensitive and computation-intensive vehicular tasks, and ensure timely processing and …


Imitative Generation Of Optimal Guidance Law Based On Reinforcement Learning, Zhengxuan Jia, Tingyu Lin, Yingying Xiao, Guoqiang Shi, Hao Wang, Bi Zeng, Yiming Ou, Pengpeng Zhao Nov 2023

Imitative Generation Of Optimal Guidance Law Based On Reinforcement Learning, Zhengxuan Jia, Tingyu Lin, Yingying Xiao, Guoqiang Shi, Hao Wang, Bi Zeng, Yiming Ou, Pengpeng Zhao

Journal of System Simulation

Abstract: Under the background of high-speed maneuvering target interception, an optimal guidance law generation method for head-on interception independent of target acceleration estimation is proposed based on deep reinforcement learning. In addition, its effectiveness is verified through simulation experiments. As the simulation results suggest, the proposed method successfully achieves head-on interception of high-speed maneuvering targets in 3D space and largely reduces the requirement for target estimation with strong uncertainty, and it is more applicable than the optimal control method.


Aircraft Assignment Method For Optimal Utilization Of Maintenance Intervals, Runxia Guo, Yifu Wang Sep 2023

Aircraft Assignment Method For Optimal Utilization Of Maintenance Intervals, Runxia Guo, Yifu Wang

Journal of System Simulation

Abstract: The aircraft assignment problem is studied from a maintenance assurance perspective. In order to ensure its continuous airworthiness, civil aircraft are required to perform maintenance tasks, i. e., scheduled inspections, at specified intervals. The scheduled inspection interval is usually controlled by the number of flight cycles (FC), flight hours (FH), or flight days (FD), whichever comes first. In order to make balanced use of the inspection interval, an aircraft assignment model for a given fleet size is developed to optimize the maintenance interval utilization, and it is solved by a reinforcement learning algorithm to minimize the variance of the …


Multi-Agent Cooperative Combat Simulation In Naval Battlefield With Reinforcement Learning, Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei Apr 2023

Multi-Agent Cooperative Combat Simulation In Naval Battlefield With Reinforcement Learning, Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei

Journal of System Simulation

Abstract: Due to the rapidly-changed situations of future naval battlefields, it is urgent to realize the high-quality combat simulation in naval battlefields based on artificial intelligence to comprehensively optimize and improve the combat effectiveness of our army and defeat the enemy. The collaboration of combat units is the key point and how to realize the balanced decision-making among multiple agents is the first task. Based on decoupling priority experience replay mechanism and attention mechanism, a multi-agent reinforcement learning-based cooperative combat simulation (MARL-CCSA) network is proposed. Based on the expert experience, a multi-scale reward function is designed, on which a naval …


Research On Unmanned Swarm Combat System Adaptive Evolution Model Simulation, Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma Apr 2023

Research On Unmanned Swarm Combat System Adaptive Evolution Model Simulation, Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma

Journal of System Simulation

Abstract: Aiming at the fact that the intelligent unmanned swarm combat system is mainly composed of large-scale combat individuals with limited behavioral capabilities and has limited ability to adapt to the changes of battlefield environment and combat opponents, a learning evolution method combining genetic algorithm and reinforcement learning is proposed to construct an individual-based unmanned bee colony combat system evolution model. To improve the adaptive evolution efficiency of bee colony combat system, an improved genetic algorithm is proposed to improve the learning and evolution speed of bee colony individuals by using individual-specific mutation optimization strategy. Simulation experiment on …


Dqn-Based Joint Scheduling Method Of Heterogeneous Tt&C Resources, Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu Feb 2023

Dqn-Based Joint Scheduling Method Of Heterogeneous Tt&C Resources, Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu

Journal of System Simulation

Abstract: Joint scheduling of heterogeneous TT&C resources as research object, a deep Q network (DQN) algorithm based on reinforcement learning is proposed. The characteristics of the joint scheduling problem of heterogeneous TT&C resources being fully analyzied and mathematical language being used to describe the constraints affecting the solution, a resource joint scheduling model is established. From the perspective of applying reinforcement learning, two neural networks with the same structure and the action selection strategies based onεgreedy algorithm are respectively designed after Markov decision process description, and DQN solution framework is established. The simulation results show that DQN-based heterogeneous …


Reinforcement-Learning-Based Adaptive Tracking Control For A Space Continuum Robot Based On Reinforcement Learning, Da Jiang, Zhiqin Cai, Zhongzhen Liu, Haijun Peng, Zhigang Wu Oct 2022

Reinforcement-Learning-Based Adaptive Tracking Control For A Space Continuum Robot Based On Reinforcement Learning, Da Jiang, Zhiqin Cai, Zhongzhen Liu, Haijun Peng, Zhigang Wu

Journal of System Simulation

Abstract: Aiming at the tracking control for three-arm space continuum robot in space active debris removal manipulation, an adaptive sliding mode control algorithm based on deep reinforcement learning is proposed. Through BP network, a data-driven dynamic model is developed as the predictive model to guide the reinforcement learning to adjust the sliding mode controller's parameters online, and finally realize a real-time tracking control. Simulation results show that the proposed data-driven predictive model can accurately predict the robot's dynamic characteristics with the relative error within ±1% to random trajectories. Compared with the fixed-parameter sliding mode controller, the proposed adaptive controller …


Application Of Improved Q Learning Algorithm In Job Shop Scheduling Problem, Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian Jun 2022

Application Of Improved Q Learning Algorithm In Job Shop Scheduling Problem, Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian

Journal of System Simulation

Abstract: Aiming at the job shop scheduling in a dynamic environment, a dynamic scheduling algorithm based on an improved Q learning algorithm and dispatching rules is proposed. The state space of the dynamic scheduling algorithm is described with the concept of "the urgency of remaining tasks" and a reward function with the purpose of "the higher the slack, the higher the penalty" is disigned. In view of the problem that the greedy strategy will select the sub-optimal actions in the later stage of learning, the traditional Q learning algorithm is improved by introducing an action selection strategy based on the …


Research On The Construction Method Of Simulation Evaluation Index Of Operation Effectiveness Operation Concept Traction, Ziwei Zhang, Liang Li, Zhiming Dong, Yifei Wang, Li Duan Mar 2022

Research On The Construction Method Of Simulation Evaluation Index Of Operation Effectiveness Operation Concept Traction, Ziwei Zhang, Liang Li, Zhiming Dong, Yifei Wang, Li Duan

Journal of System Simulation

Abstract: Agents are difficult to be directly modeled and simulated due to the complexity of their own interaction and learning behaviors. Aiming at the common problems in the discrete simulation of the agent, the event transfer mechanism of the discrete event system specification (DEVS) atomic model is applied to express the interaction and learning of an agent. Through the interaction mode of the agent, the transfer control of multi-state external events, the port connection mode, as well as the introduction of reinforcement learning event transfer representation, a discrete simulation construction method of the agent based on the DEVS atomic model …


Research On Experimental Method Of Joint Operation Simulation Based On Human-Machine Hybrid Intelligence, Ma Jun, Jingyu Yang, Wu Xi Oct 2021

Research On Experimental Method Of Joint Operation Simulation Based On Human-Machine Hybrid Intelligence, Ma Jun, Jingyu Yang, Wu Xi

Journal of System Simulation

Abstract: In view of the difficulties that the joint operation simulation experiment methods are mainly for guiding equipment evaluation and demonstration, which is difficult to effectively support the research of operation problems, a joint operation simulation experiment method based on human-machine hybrid intelligence is proposed. The classification, generation and accumulation process of the knowledge in joint operation simulation experiment are clarified. Through the detailed descriptions of experimental interaction process, experimental operation process, experimental driving mode, simulation operation mode, supporting system structure, etc., a joint operation simulation experiment framework based on man-machine hybrid intelligence is constructed. It provides a new method …


Dqn-Based Path Planning Method And Simulation For Submarine And Warship In Naval Battlefield, Xiaodong Huang, Haitao Yuan, Bi Jing, Liu Tao Oct 2021

Dqn-Based Path Planning Method And Simulation For Submarine And Warship In Naval Battlefield, Xiaodong Huang, Haitao Yuan, Bi Jing, Liu Tao

Journal of System Simulation

Abstract: To realize multi-agent intelligent planning and target tracking in complex naval battlefield environment, the work focuses on agents (submarine or warship), and proposes a simulation method based on reinforcement learning algorithm called Deep Q Network (DQN). Two neural networks with the same structure and different parameters are designed to update real and predicted Q values for the convergence of value functions. An ε-greedy algorithm is proposed to design an action selection mechanism, and a reward function is designed for the naval battlefield environment to increase the update velocity and generalization ability of Learning with Experience Replay (LER). Simulation results …


Study On Next-Generation Strategic Wargame System, Wu Xi, Xianglin Meng, Jingyu Yang Sep 2021

Study On Next-Generation Strategic Wargame System, Wu Xi, Xianglin Meng, Jingyu Yang

Journal of System Simulation

Abstract: Strategic wargame is an important support to the strategic decision. The research status and challenges of the strategic wargame are analyzed, and the influence of big data and artificial intelligence technology on the strategic wargame system is studied. The prospects and key technologies of the next-generation strategic wargame system are studied, including the construction of event association graph for strategic topics, generation of strategic decision sparse samples based on generative adversarial nets, gaming strategy learning of human-in-loop hybrid enhancement, and public opinion dissemination modeling technology based on social network. The development trend of the strategic wargame is proposed.


Self-Learning-Based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition, Zhao Yu, Jifeng Guo, Yan Peng, Chengchao Bai Aug 2021

Self-Learning-Based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition, Zhao Yu, Jifeng Guo, Yan Peng, Chengchao Bai

Journal of System Simulation

Abstract: In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming at the low success rate of traditional procedural maneuver evasion, a multi-agent cooperative autonomous decision-making algorithm, which is based on deep reinforcement learning method, is proposed. Based on the actor-critic architecture, a multi-agent reinforcement learning algorithm is designed, in which a weighted linear fitting method is proposed to solve the reliability allocation problem of the self-learning system. To solve the sparse reward problem in task scenario, a sparse reward reinforcement learning method based on inverse value method is proposed. According to the task scenario, …


Joint Optimization Control Of Energy Storage System Management And Demand Response, Xueying Gao, Tang Hao, Gangzhong Miao, Zhaowu Ping Jul 2020

Joint Optimization Control Of Energy Storage System Management And Demand Response, Xueying Gao, Tang Hao, Gangzhong Miao, Zhaowu Ping

Journal of System Simulation

Abstract: The joint optimization problem of energy management and demand response were studied in order to reduce the long-run cost of electricity users equipped with energy storage unit and smart applications, and to increase their benefits meanwhile. The goals were achieved by controlling both the energy storage unit (charging, discharging, or idle) and the load service (access or delay). Based on the random nature of solar photovoltaic, load demand electricity and electricity price, the joint optimization problem was modeled as infinite-horizon Markov decision process model, and Q-learning algorithm was proposed to find the optimal solution. Simulation results show that the …


Analysis And Optimization Of The Action Chain Mechanism In Agent2d Underlying In Robocup2d Soccer League, Chen Bing, Feifan Xu, Hanyan Xu, Zekai Cheng, Liu Cheng Jun 2020

Analysis And Optimization Of The Action Chain Mechanism In Agent2d Underlying In Robocup2d Soccer League, Chen Bing, Feifan Xu, Hanyan Xu, Zekai Cheng, Liu Cheng

Journal of System Simulation

Abstract: In the RoboCup2D soccer league, Agent2D is one of the most widely used underlying team in China. Data transmission noise and the incomplete action chain mechanism make the underlying teams using Agent2D be lack of flexibility. This paper introduces an action correcting parameter and optimizes the operation of the action chain by reinforcement learning mechanism. The performance of the Agent2D underlying team is improved in the game and the adaptability of the team is enhanced. Simulation experiment results show that this method has a certain effect.


Robot Arm Control Method Based On Deep Reinforcement Learning, Heyu Li, Zhilong Zhao, Gu Lei, Liqin Guo, Zeng Bi, Tingyu Lin Dec 2019

Robot Arm Control Method Based On Deep Reinforcement Learning, Heyu Li, Zhilong Zhao, Gu Lei, Liqin Guo, Zeng Bi, Tingyu Lin

Journal of System Simulation

Abstract: Deep reinforcement learning continues to explore in the environment and adjusts the neural network parameters by the reward function. The actual production line can not be used as the trial and error environment for the algorithm, so there is not enough data. For that, this paper constructs a virtual robot arm simulation environment, including the robot arm and the object. The Deep Deterministic Policy Gradient (DDPG),in which the state variables and reward function are set,is trained by deep reinforcement learning algorithm in the simulation environment to realize the target of controlling the robot arm to move the gripper below …


Dp-Q(Λ): Real-Time Path Planning For Multi-Agent In Large-Scale Web3d Scene, Fengting Yan, Jinyuan Jia Apr 2019

Dp-Q(Λ): Real-Time Path Planning For Multi-Agent In Large-Scale Web3d Scene, Fengting Yan, Jinyuan Jia

Journal of System Simulation

Abstract: The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems, the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. If the next position is free, the agent could walk to it. The above strategy …