Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Reinforcement learning

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 187

Full-Text Articles in Physical Sciences and Mathematics

De Novo Drug Design Using Transformer-Based Machine Translation And Reinforcement Learning Of An Adaptive Monte Carlo Tree Search, Dony Ang, Cyril Rakovski, Hagop S. Atamian Jan 2024

De Novo Drug Design Using Transformer-Based Machine Translation And Reinforcement Learning Of An Adaptive Monte Carlo Tree Search, Dony Ang, Cyril Rakovski, Hagop S. Atamian

Biology, Chemistry, and Environmental Sciences Faculty Articles and Research

The discovery of novel therapeutic compounds through de novo drug design represents a critical challenge in the field of pharmaceutical research. Traditional drug discovery approaches are often resource intensive and time consuming, leading researchers to explore innovative methods that harness the power of deep learning and reinforcement learning techniques. Here, we introduce a novel drug design approach called drugAI that leverages the Encoder–Decoder Transformer architecture in tandem with Reinforcement Learning via a Monte Carlo Tree Search (RL-MCTS) to expedite the process of drug discovery while ensuring the production of valid small molecules with drug-like characteristics and strong binding affinities towards …


Energy Consumption Optimization Of Uav-Assisted Traffic Monitoring Scheme With Tiny Reinforcement Learning, Xiangjie Kong, Chenhao Ni, Gaohui Duan, Guojiang Shen, Yao Yang, Sajal K. Das Jan 2024

Energy Consumption Optimization Of Uav-Assisted Traffic Monitoring Scheme With Tiny Reinforcement Learning, Xiangjie Kong, Chenhao Ni, Gaohui Duan, Guojiang Shen, Yao Yang, Sajal K. Das

Computer Science Faculty Research & Creative Works

Unmanned Aerial Vehicles (UAVs) can capture pictures of road conditions in all directions and from different angles by carrying high-definition cameras, which helps gather relevant road data more effectively. However, due to their limited energy capacity, drones face challenges in performing related tasks for an extended period. Therefore, a crucial concern is how to plan the path of UAVs and minimize energy consumption. To address this problem, we propose a multi-agent deep deterministic policy gradient based (MADDPG) algorithm for UAV path planning (MAUP). Considering the energy consumption and memory usage of MAUP, we have conducted optimizations to reduce consumption on …


A New Cache Replacement Policy In Named Data Network Based On Fib Table Information, Mehran Hosseinzadeh, Neda Moghim, Samira Taheri, Nasrin Gholami Jan 2024

A New Cache Replacement Policy In Named Data Network Based On Fib Table Information, Mehran Hosseinzadeh, Neda Moghim, Samira Taheri, Nasrin Gholami

VMASC Publications

Named Data Network (NDN) is proposed for the Internet as an information-centric architecture. Content storing in the router’s cache plays a significant role in NDN. When a router’s cache becomes full, a cache replacement policy determines which content should be discarded for the new content storage. This paper proposes a new cache replacement policy called Discard of Fast Retrievable Content (DFRC). In DFRC, the retrieval time of the content is evaluated using the FIB table information, and the content with less retrieval time receives more discard priority. An impact weight is also used to involve both the grade of retrieval …


Reinforcement Learning: Applying Low Discrepancy Action Selection To Deep Deterministic Policy Gradient, Aleksandr Svishchev Jan 2024

Reinforcement Learning: Applying Low Discrepancy Action Selection To Deep Deterministic Policy Gradient, Aleksandr Svishchev

Electronic Theses and Dissertations

Reinforcement learning (RL) is a subfield of machine learning concerned with agents learning to behave optimally by interacting with an environment. One of the most important topics in RL is how the agent should explore, that is, how to choose actions in order to rate their impact on long-term reward. For example, a simple baseline strategy might be uniformly random action selection. This thesis investigates the heuristic idea that agents will learn faster if they explore by factoring the environment’s state into their decision and intentionally choose actions which are as different as possible from what they have previously observed. …


Research And Development Of Simulation Training Platform For Multi-Agent Collaborative Decision-Making, Cheng Cheng, Zhijie Chen, Ziming Guo, Ni Li Dec 2023

Research And Development Of Simulation Training Platform For Multi-Agent Collaborative Decision-Making, Cheng Cheng, Zhijie Chen, Ziming Guo, Ni Li

Journal of System Simulation

Abstract: Reinforcement learning simulation platform can be an interactive and training environment for reinforcement learning. In order to make the simulation platform compatible with the multi-agent reinforcement learning algorithms and meet the needs of simulation in military field, the similar processes in multi-agent reinforcement learning algorithms are refined and a unified interface is designed to embed and verify different types of deep reinforcement learning algorithms on the simulation platform and to optimize the back-end service of the simulation platform to accelerate the training process of the algorithm model. The experimental results show that, by unifing the interface, the simulation platform …


Neural Airport Ground Handling, Yaoxin Wu, Jianan Zhou, Yunwen Xia, Xianli Zhang, Zhiguang Cao, Jie Zhang Dec 2023

Neural Airport Ground Handling, Yaoxin Wu, Jianan Zhou, Yunwen Xia, Xianli Zhang, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

Airport ground handling (AGH) offers necessary operations to flights during their turnarounds and is of great importance to the efficiency of airport management and the economics of aviation. Such a problem involves the interplay among the operations that leads to NP-hard problems with complex constraints. Hence, existing methods for AGH are usually designed with massive domain knowledge but still fail to yield high-quality solutions efficiently. In this paper, we aim to enhance the solution quality and computation efficiency for solving AGH. Particularly, we first model AGH as a multiple-fleet vehicle routing problem (VRP) with miscellaneous constraints including precedence, time windows, …


Intercell Dynamic Scheduling Method Based On Deep Reinforcement Learning, Jing Ni, Mengke Ma Nov 2023

Intercell Dynamic Scheduling Method Based On Deep Reinforcement Learning, Jing Ni, Mengke Ma

Journal of System Simulation

Abstract: In order to solve the intercell scheduling problem of dynamic arrival of machining tasks and realize adaptive scheduling in the complex and changeable environment of the intelligent factory, a scheduling method based on a deep Q network is proposed. A complex network with cells as nodes and workpiece intercell machining path as directed edges is constructed, and the degree value is introduced to define the state space with intercell scheduling characteristics. A compound scheduling rule composed of a workpiece layer, unit layer, and machine layer is designed, and hierarchical optimization makes the scheduling scheme more global. Since double deep …


Uav-Enabled Task Offloading Strategy For Vehicular Edge Computing Networks, Feng Hu, Haiyang Gu, Jun Lin Nov 2023

Uav-Enabled Task Offloading Strategy For Vehicular Edge Computing Networks, Feng Hu, Haiyang Gu, Jun Lin

Journal of System Simulation

Abstract: As intelligent vehicles are equipped with more and more sensors, the explosive growth of sensor data is generated, which brings severe challenges to vehicular communication and computing. In addition, the modern road presents a three-dimensional structure, and the system architecture of traditional vehicular networks cannot guarantee full coverage and seamless computing. A task offloading strategy for UAV-assisted and 6G-enabled (Sixth Generation) vehicular edge computing networks is proposed. Furthermore, a flexible and intelligent vehicular edge computing mode is composed by vehicles and UAVs, which provide three-dimensional edge computing services for delay-sensitive and computation-intensive vehicular tasks, and ensure timely processing and …


Imitative Generation Of Optimal Guidance Law Based On Reinforcement Learning, Zhengxuan Jia, Tingyu Lin, Yingying Xiao, Guoqiang Shi, Hao Wang, Bi Zeng, Yiming Ou, Pengpeng Zhao Nov 2023

Imitative Generation Of Optimal Guidance Law Based On Reinforcement Learning, Zhengxuan Jia, Tingyu Lin, Yingying Xiao, Guoqiang Shi, Hao Wang, Bi Zeng, Yiming Ou, Pengpeng Zhao

Journal of System Simulation

Abstract: Under the background of high-speed maneuvering target interception, an optimal guidance law generation method for head-on interception independent of target acceleration estimation is proposed based on deep reinforcement learning. In addition, its effectiveness is verified through simulation experiments. As the simulation results suggest, the proposed method successfully achieves head-on interception of high-speed maneuvering targets in 3D space and largely reduces the requirement for target estimation with strong uncertainty, and it is more applicable than the optimal control method.


Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm Oct 2023

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm

Theses and Dissertations

Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more …


Decentralized Multimedia Data Sharing In Iov: A Learning-Based Equilibrium Of Supply And Demand, Jiani Fan, Minrui Xu, Jiale Guo, Lwin Khin Shar, Jiawen Kang, Dusit Niyato, Kwok-Yan Lam Oct 2023

Decentralized Multimedia Data Sharing In Iov: A Learning-Based Equilibrium Of Supply And Demand, Jiani Fan, Minrui Xu, Jiale Guo, Lwin Khin Shar, Jiawen Kang, Dusit Niyato, Kwok-Yan Lam

Research Collection School Of Computing and Information Systems

The Internet of Vehicles (IoV) has great potential to transform transportation systems by enhancing road safety, reducing traffic congestion, and improving user experience through onboard infotainment applications. Decentralized data sharing can improve security, privacy, reliability, and facilitate infotainment data sharing in IoVs. However, decentralized data sharing may not achieve the expected efficiency if there are IoV users who only want to consume the shared data but are not willing to contribute their own data to the community, resulting in incomplete information observed by other vehicles and infrastructure, which can introduce additional transmission latency. Therefore, in this paper, by modeling the …


Aircraft Assignment Method For Optimal Utilization Of Maintenance Intervals, Runxia Guo, Yifu Wang Sep 2023

Aircraft Assignment Method For Optimal Utilization Of Maintenance Intervals, Runxia Guo, Yifu Wang

Journal of System Simulation

Abstract: The aircraft assignment problem is studied from a maintenance assurance perspective. In order to ensure its continuous airworthiness, civil aircraft are required to perform maintenance tasks, i. e., scheduled inspections, at specified intervals. The scheduled inspection interval is usually controlled by the number of flight cycles (FC), flight hours (FH), or flight days (FD), whichever comes first. In order to make balanced use of the inspection interval, an aircraft assignment model for a given fleet size is developed to optimize the maintenance interval utilization, and it is solved by a reinforcement learning algorithm to minimize the variance of the …


Dynamic Influence Diagram-Based Deep Reinforcement Learning Framework And Application For Decision Support For Operators In Control Rooms, Joseph Mietkiewicz, Ammar N. Abbas, Chidera Winifred Amazu, Anders L. Madsen, Gabriele Baldissone Sep 2023

Dynamic Influence Diagram-Based Deep Reinforcement Learning Framework And Application For Decision Support For Operators In Control Rooms, Joseph Mietkiewicz, Ammar N. Abbas, Chidera Winifred Amazu, Anders L. Madsen, Gabriele Baldissone

Articles

In today’s complex industrial environment, operators are often faced with challenging situations that require quick and accurate decision-making. The human-machine interface (HMI) can display too much information, leading to information overload and potentially compromising the operator’s ability to respond effectively. To address this challenge, decision support models are needed to assist operators in identifying and responding to potential safety incidents. In this paper, we present an experiment to evaluate the effectiveness of a recommendation system in addressing the challenge of information overload. The case study focuses on a formaldehyde production simulator and examines the performance of an improved Human-Machine Interface …


Reinforcement Learning Approach To Stochastic Vehicle Routing Problem With Correlated Demands, Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac Aug 2023

Reinforcement Learning Approach To Stochastic Vehicle Routing Problem With Correlated Demands, Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac

Machine Learning Faculty Publications

We present a novel end-to-end framework for solving the Vehicle Routing Problem with stochastic demands (VRPSD) using Reinforcement Learning (RL). Our formulation incorporates the correlation between stochastic demands through other observable stochastic variables, thereby offering an experimental demonstration of the theoretical premise that non-i.i.d. stochastic demands provide opportunities for improved routing solutions. Our approach bridges the gap in the application of RL to VRPSD and consists of a parameterized stochastic policy optimized using a policy gradient algorithm to generate a sequence of actions that form the solution. Our model outperforms previous state-of-the-art metaheuristics and demonstrates robustness to changes in the …


Transferable Curricula Through Difficulty Conditioned Generators, Sidney Tio, Pradeep Varakantham Aug 2023

Transferable Curricula Through Difficulty Conditioned Generators, Sidney Tio, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial "Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents …


A Machine Learning Approach To Constructing Ramsey Graphs Leads To The Trahtenbrot-Zykov Problem., Emily Hawboldt Aug 2023

A Machine Learning Approach To Constructing Ramsey Graphs Leads To The Trahtenbrot-Zykov Problem., Emily Hawboldt

Electronic Theses and Dissertations

Attempts at approaching the well-known and difficult problem of constructing Ramsey graphs via machine learning lead to another difficult problem posed by Zykov in 1963 (now commonly referred to as the Trahtenbrot-Zykov problem): For which graphs F does there exist some graph G such that the neighborhood of every vertex in G induces a subgraph isomorphic to F? Chapter 1 provides a brief introduction to graph theory. Chapter 2 introduces Ramsey theory for graphs. Chapter 3 details a reinforcement learning implementation for Ramsey graph construction. The implementation is based on board game software, specifically the AlphaZero program and its …


Insights Into The Application Of Deep Reinforcement Learning In Healthcare And Materials Science, Benjamin R. Smith Aug 2023

Insights Into The Application Of Deep Reinforcement Learning In Healthcare And Materials Science, Benjamin R. Smith

Doctoral Dissertations

Reinforcement learning (RL) is a type of machine learning designed to optimize sequential decision-making. While controlled environments have served as a foundation for RL research, due to the growth in data volumes and deep learning methods, it is now increasingly being applied to real-world problems. In our work, we explore and attempt to overcome challenges that occur when applying RL to solve problems in healthcare and materials science.

First, we explore how issues in bias and data completeness affect healthcare applications of RL. To understand how bias has already been considered in this area, we survey the literature for existing …


Multi-View Hypergraph Contrastive Policy Learning For Conversational Recommendation, Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai: Yang Zhu, Zujie Wen, Dangyang Chen, Feida Zhu, Feida Zhu Jul 2023

Multi-View Hypergraph Contrastive Policy Learning For Conversational Recommendation, Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai: Yang Zhu, Zujie Wen, Dangyang Chen, Feida Zhu, Feida Zhu

Research Collection School Of Computing and Information Systems

Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user preferences with pairwise relations from the interactive conversation and item knowledge, while largely ignoring the fact that factors for a relationship in CRS are multiplex. Specifically, the user likes/dislikes the items that satisfy some attributes (Like/Dislike view). Moreover social influence is another important factor that affects user preference towards the item (Social view), while is largely ignored by previous works in CRS. The user preferences from these …


Imitation Improvement Learning For Large-Scale Capacitated Vehicle Routing Problems, The Viet Bui, Tien Mai Jul 2023

Imitation Improvement Learning For Large-Scale Capacitated Vehicle Routing Problems, The Viet Bui, Tien Mai

Research Collection School Of Computing and Information Systems

Recent works using deep reinforcement learning (RL) to solve routing problems such as the capacitated vehicle routing problem (CVRP) have focused on improvement learning-based methods, which involve improving a given solution until it becomes near-optimal. Although adequate solutions can be achieved for small problem instances, their efficiency degrades for large-scale ones. In this work, we propose a newimprovement learning-based framework based on imitation learning where classical heuristics serve as experts to encourage the policy model to mimic and produce similar or better solutions. Moreover, to improve scalability, we propose Clockwise Clustering, a novel augmented framework for decomposing large-scale CVRP into …


Reinforcement Learning For Sequential Decision Making With Constraints, Jiajing Ling Jul 2023

Reinforcement Learning For Sequential Decision Making With Constraints, Jiajing Ling

Dissertations and Theses Collection (Open Access)

Reinforcement learning is a widely used approach to tackle problems in sequential decision making where an agent learns from rewards or penalties. However, in decision-making problems that involve safety or limited resources, the agent's exploration is often limited by constraints. To model such problems, constrained Markov decision processes and constrained decentralized partially observable Markov decision processes have been proposed for single-agent and multi-agent settings, respectively. A significant challenge in solving constrained Dec-POMDP is determining the contribution of each agent to the primary objective and constraint violations. To address this issue, we propose a fictitious play-based method that uses Lagrangian Relaxation …


An Investigation Into Machine Learning Techniques For Designing Dynamic Difficulty Agents In Real-Time Games, Ryan Adare Dunagan Jun 2023

An Investigation Into Machine Learning Techniques For Designing Dynamic Difficulty Agents In Real-Time Games, Ryan Adare Dunagan

Electronic Theses and Dissertations

Video games are an incredibly popular pastime enjoyed by people of all ages world wide. Many different kinds of games exist, but most games feature some elements of the player overcoming some challenge, usually through gameplay. These challenges are insurmountable for some people and may turn them off to video games as a pastime. Games can be made more accessible to players of little skill and/or experience through the use of Dynamic Difficulty Adjustment (DDA) systems that adjust the difficulty of the game in response to the player’s performance. This research seeks to establish the effectiveness of machine learning techniques …


Dynamic Police Patrol Scheduling With Multi-Agent Reinforcement Learning, Songhan Wong, Waldy Joe, Hoong Chuin Lau Jun 2023

Dynamic Police Patrol Scheduling With Multi-Agent Reinforcement Learning, Songhan Wong, Waldy Joe, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

Effective police patrol scheduling is essential in projecting police presence and ensuring readiness in responding to unexpected events in urban environments. However, scheduling patrols can be a challenging task as it requires balancing between two conflicting objectives namely projecting presence (proactive patrol) and incident response (reactive patrol). This task is made even more challenging with the fact that patrol schedules do not remain static as occurrences of dynamic incidents can disrupt the existing schedules. In this paper, we propose a solution to this problem using Multi-Agent Reinforcement Learning (MARL) to address the Dynamic Bi-objective Police Patrol Dispatching and Rescheduling Problem …


Detecting Complex Cyber Attacks Using Decoys With Online Reinforcement Learning, Marcus Gutierrez May 2023

Detecting Complex Cyber Attacks Using Decoys With Online Reinforcement Learning, Marcus Gutierrez

Open Access Theses & Dissertations

Most vulnerabilities discovered in cybersecurity can be associated with their own singular piece of software. I investigate complex vulnerabilities, which may require multiple software to be present. These complex vulnerabilities represent 16.6% of all documented vulnerabilities and are more dangerous on average than their simple vulnerability counterparts. In addition to this, because they often require multiple pieces of software to be present, they are harder to identify overall as specific combinations are needed for the vulnerability to appear.

I consider the motivating scenario where an attacker is repeatedly deploying exploits that use complex vulnerabilities into an Airport Wi-Fi. The network …


Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li May 2023

Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li

Research Collection School Of Computing and Information Systems

Domain adaptation enables generalized learning in new environments by transferring knowledge from label-rich source domains to label-scarce target domains. As a more realistic extension, partial domain adaptation (PDA) relaxes the assumption of fully shared label space, and instead deals with the scenario where the target label space is a subset of the source label space. In this paper, we propose a Reinforced Adaptation Network (RAN) to address the challenging PDA problem. Specifically, a deep reinforcement learning model is proposed to learn source data selection policies. Meanwhile, a domain adaptation model is presented to simultaneously determine rewards and learn domain-invariant feature …


Sim-To-Real Reinforcement Learning Framework For Autonomous Aerial Leaf Sampling, Ashraful Islam May 2023

Sim-To-Real Reinforcement Learning Framework For Autonomous Aerial Leaf Sampling, Ashraful Islam

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Using unmanned aerial systems (UAS) for leaf sampling is contributing to a better understanding of the influence of climate change on plant species, and the dynamics of forest ecology by studying hard-to-reach tree canopies. Currently, multiple skilled operators are required for UAS maneuvering and using the leaf sampling tool. This often limits sampling to only the canopy top or periphery. Sim-to-real reinforcement learning (RL) can be leveraged to tackle challenges in the autonomous operation of aerial leaf sampling in the changing environment of a tree canopy. However, trans- ferring an RL controller that is learned in simulation to real UAS …


Research On Unmanned Swarm Combat System Adaptive Evolution Model Simulation, Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma Apr 2023

Research On Unmanned Swarm Combat System Adaptive Evolution Model Simulation, Zhiqiang Li, Yuanlong Li, Laixiang Yin, Xiangping Ma

Journal of System Simulation

Abstract: Aiming at the fact that the intelligent unmanned swarm combat system is mainly composed of large-scale combat individuals with limited behavioral capabilities and has limited ability to adapt to the changes of battlefield environment and combat opponents, a learning evolution method combining genetic algorithm and reinforcement learning is proposed to construct an individual-based unmanned bee colony combat system evolution model. To improve the adaptive evolution efficiency of bee colony combat system, an improved genetic algorithm is proposed to improve the learning and evolution speed of bee colony individuals by using individual-specific mutation optimization strategy. Simulation experiment on …


Multi-Agent Cooperative Combat Simulation In Naval Battlefield With Reinforcement Learning, Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei Apr 2023

Multi-Agent Cooperative Combat Simulation In Naval Battlefield With Reinforcement Learning, Ding Shi, Xuefeng Yan, Lina Gong, Jingxuan Zhang, Donghai Guan, Mingqiang Wei

Journal of System Simulation

Abstract: Due to the rapidly-changed situations of future naval battlefields, it is urgent to realize the high-quality combat simulation in naval battlefields based on artificial intelligence to comprehensively optimize and improve the combat effectiveness of our army and defeat the enemy. The collaboration of combat units is the key point and how to realize the balanced decision-making among multiple agents is the first task. Based on decoupling priority experience replay mechanism and attention mechanism, a multi-agent reinforcement learning-based cooperative combat simulation (MARL-CCSA) network is proposed. Based on the expert experience, a multi-scale reward function is designed, on which a naval …


A Review On Derivative Hedging Using Reinforcement Learning, Peng Liu Mar 2023

A Review On Derivative Hedging Using Reinforcement Learning, Peng Liu

Research Collection Lee Kong Chian School Of Business

Hedging is a common trading activity to manage the risk of engaging in transactions that involve derivatives such as options. Perfect and timely hedging, however, is an impossible task in the real market that characterizes discrete-time transactions with costs. Recent years have witnessed reinforcement learning (RL) in formulating optimal hedging strategies. Specifically, different RL algorithms have been applied to learn the optimal offsetting position based on market conditions, offering an automatic risk management solution that proposes optimal hedging strategies while catering to both market dynamics and restrictions. In this article, the author provides a comprehensive review of the use of …


Dqn-Based Joint Scheduling Method Of Heterogeneous Tt&C Resources, Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu Feb 2023

Dqn-Based Joint Scheduling Method Of Heterogeneous Tt&C Resources, Naiyang Xue, Dan Ding, Yutong Jia, Zhiqiang Wang, Yuan Liu

Journal of System Simulation

Abstract: Joint scheduling of heterogeneous TT&C resources as research object, a deep Q network (DQN) algorithm based on reinforcement learning is proposed. The characteristics of the joint scheduling problem of heterogeneous TT&C resources being fully analyzied and mathematical language being used to describe the constraints affecting the solution, a resource joint scheduling model is established. From the perspective of applying reinforcement learning, two neural networks with the same structure and the action selection strategies based onεgreedy algorithm are respectively designed after Markov decision process description, and DQN solution framework is established. The simulation results show that DQN-based heterogeneous …


Constrained Reinforcement Learning In Hard Exploration Problems, Pankayaraj Pathmanathan, Pradeep Varakantham Feb 2023

Constrained Reinforcement Learning In Hard Exploration Problems, Pankayaraj Pathmanathan, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

One approach to guaranteeing safety in Reinforcement Learning is through cost constraints that are imposed on trajectories. Recent works in constrained RL have developed methods that ensure constraints can be enforced even at learning time while maximizing the overall value of the policy. Unfortunately, as demonstrated in our experimental results, such approaches do not perform well on complex multi-level tasks, with longer episode lengths or sparse rewards. To that end, wepropose a scalable hierarchical approach for constrained RL problems that employs backward cost value functions in the context of task hierarchy and a novel intrinsic reward function in lower levels …