Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2021

Reinforcement learning

Discipline
Institution
Publication
Publication Type

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Rocket Learn, Daanesh Ibrahim, Jules Stacy, David Stroud, Yusi Zhang Dec 2021

Rocket Learn, Daanesh Ibrahim, Jules Stacy, David Stroud, Yusi Zhang

SMU Data Science Review

Abstract. This paper covers the development, testing, and implementation of Reinforcement Learning methods designed to autonomously learn and optimize Rocket League play. This study aims to analyze and benchmark model frameworks commonly used in Reinforcement Learning applications. These models can be applied to tasks ranging in difficulty from simple to superhumanly complex, and this study will begin with and build upon simple models performing simple tasks. It will result in complex models performing difficult tasks. Models will be allowed to train autonomously on the game using mass parallelization to expedite training times with the goal of maximizing reward function scores. …


Hierarchical Control Of Multi-Agent Reinforcement Learning Team In Real-Time Strategy (Rts) Games, Weigui Jair Zhou, Budhitama Subagdja, Ah-Hwee Tan, Darren Wee Sze Ong Dec 2021

Hierarchical Control Of Multi-Agent Reinforcement Learning Team In Real-Time Strategy (Rts) Games, Weigui Jair Zhou, Budhitama Subagdja, Ah-Hwee Tan, Darren Wee Sze Ong

Research Collection School Of Computing and Information Systems

Coordinated control of multi-agent teams is an important task in many real-time strategy (RTS) games. In most prior work, micromanagement is the commonly used strategy whereby individual agents operate independently and make their own combat decisions. On the other extreme, some employ a macromanagement strategy whereby all agents are controlled by a single decision model. In this paper, we propose a hierarchical command and control architecture, consisting of a single high-level and multiple low-level reinforcement learning agents operating in a dynamic environment. This hierarchical model enables the low-level unit agents to make individual decisions while taking commands from the high-level …


Intelligent Traffic Management: From Practical Stochastic Path Planning To Reinforcement Learning Based City-Wide Traffic Optimization, Kamilia Ahmadi Dec 2021

Intelligent Traffic Management: From Practical Stochastic Path Planning To Reinforcement Learning Based City-Wide Traffic Optimization, Kamilia Ahmadi

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This research focuses on intelligent traffic management including stochastic path planning and city scale traffic optimization. Stochastic path planning focuses on finding paths when edge weights are not fixed and change depending on the time of day/week. Then we focus on minimizing the running time of the overall procedure at query time utilizing precomputation and approximation. The city graph is partitioned into smaller groups of nodes and represented by its exemplar. In query time, source and destination pairs are connected to their respective exemplars and the path between those exemplars is found. After this, we move toward minimizing the city …


From Mdp To Alphazero, David Robert Sewell Nov 2021

From Mdp To Alphazero, David Robert Sewell

Dissertations and Theses

In this paper I will explain the AlphaGo family of algorithms starting from first principles and requiring little previous knowledge from the reader. The focus will be upon one of the more recent versions AlphaZero but I hope to explain the core principles that allowed these algorithms to be so successful. I will generally refer to AlphaZero as theses [sic] core set of principles and will make it clear when I am referring to a specific algorithm of the AlphaGo family. AlphaZero in short combines Monte Carlo Tree Search (MCTS) with Deep learning and self-play. We will see how these …


Dqn-Based Path Planning Method And Simulation For Submarine And Warship In Naval Battlefield, Xiaodong Huang, Haitao Yuan, Bi Jing, Liu Tao Oct 2021

Dqn-Based Path Planning Method And Simulation For Submarine And Warship In Naval Battlefield, Xiaodong Huang, Haitao Yuan, Bi Jing, Liu Tao

Journal of System Simulation

Abstract: To realize multi-agent intelligent planning and target tracking in complex naval battlefield environment, the work focuses on agents (submarine or warship), and proposes a simulation method based on reinforcement learning algorithm called Deep Q Network (DQN). Two neural networks with the same structure and different parameters are designed to update real and predicted Q values for the convergence of value functions. An ε-greedy algorithm is proposed to design an action selection mechanism, and a reward function is designed for the naval battlefield environment to increase the update velocity and generalization ability of Learning with Experience Replay (LER). Simulation results …


Research On Experimental Method Of Joint Operation Simulation Based On Human-Machine Hybrid Intelligence, Ma Jun, Jingyu Yang, Wu Xi Oct 2021

Research On Experimental Method Of Joint Operation Simulation Based On Human-Machine Hybrid Intelligence, Ma Jun, Jingyu Yang, Wu Xi

Journal of System Simulation

Abstract: In view of the difficulties that the joint operation simulation experiment methods are mainly for guiding equipment evaluation and demonstration, which is difficult to effectively support the research of operation problems, a joint operation simulation experiment method based on human-machine hybrid intelligence is proposed. The classification, generation and accumulation process of the knowledge in joint operation simulation experiment are clarified. Through the detailed descriptions of experimental interaction process, experimental operation process, experimental driving mode, simulation operation mode, supporting system structure, etc., a joint operation simulation experiment framework based on man-machine hybrid intelligence is constructed. It provides a new method …


Comparative Study Of Reinforcement Learning Methods In Path Planning, Daniel Obawole Oct 2021

Comparative Study Of Reinforcement Learning Methods In Path Planning, Daniel Obawole

Electronic Theses and Dissertations

In order to perform a large variety of tasks and achieve human-level performance in complex real-world environments, an intelligent agent must be able to learn from its dynamically changing environment. Generally speaking, agents have limitations in obtaining an accurate description of the environment from what they perceive because they may not have all the information about the environment. The present research is focused on reinforcement learning algorithms that represent a defined category in the field of machine learning because of their unique approach based on a trial-error basis. Reinforcement learning is used to solve control problems based on received rewards. …


Study On Next-Generation Strategic Wargame System, Wu Xi, Xianglin Meng, Jingyu Yang Sep 2021

Study On Next-Generation Strategic Wargame System, Wu Xi, Xianglin Meng, Jingyu Yang

Journal of System Simulation

Abstract: Strategic wargame is an important support to the strategic decision. The research status and challenges of the strategic wargame are analyzed, and the influence of big data and artificial intelligence technology on the strategic wargame system is studied. The prospects and key technologies of the next-generation strategic wargame system are studied, including the construction of event association graph for strategic topics, generation of strategic decision sparse samples based on generative adversarial nets, gaming strategy learning of human-in-loop hybrid enhancement, and public opinion dissemination modeling technology based on social network. The development trend of the strategic wargame is proposed.


Self-Learning-Based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition, Zhao Yu, Jifeng Guo, Yan Peng, Chengchao Bai Aug 2021

Self-Learning-Based Multiple Spacecraft Evasion Decision Making Simulation Under Sparse Reward Condition, Zhao Yu, Jifeng Guo, Yan Peng, Chengchao Bai

Journal of System Simulation

Abstract: In order to improve the ability of spacecraft formation to evade multiple interceptors, aiming at the low success rate of traditional procedural maneuver evasion, a multi-agent cooperative autonomous decision-making algorithm, which is based on deep reinforcement learning method, is proposed. Based on the actor-critic architecture, a multi-agent reinforcement learning algorithm is designed, in which a weighted linear fitting method is proposed to solve the reliability allocation problem of the self-learning system. To solve the sparse reward problem in task scenario, a sparse reward reinforcement learning method based on inverse value method is proposed. According to the task scenario, …


Learning To Assign: Towards Fair Task Assignment In Large-Scale Ride Hailing, Dingyuan Shi, Yongxin Tong, Zimu Zhou, Bingchen Song, Weifeng Lv, Qiang Yang Aug 2021

Learning To Assign: Towards Fair Task Assignment In Large-Scale Ride Hailing, Dingyuan Shi, Yongxin Tong, Zimu Zhou, Bingchen Song, Weifeng Lv, Qiang Yang

Research Collection School Of Computing and Information Systems

Ride hailing is a widespread shared mobility application where the central issue is to assign taxi requests to drivers with various objectives. Despite extensive research on task assignment in ride hailing, the fairness of earnings among drivers is largely neglected. Pioneer studies on fair task assignment in ride hailing are ineffective and inefficient due to their myopic optimization perspective and timeconsuming assignment techniques. In this work, we propose LAF, an effective and efficient task assignment scheme that optimizes both utility and fairness. We adopt reinforcement learning to make assignments in a holistic manner and propose a set of acceleration techniques …


High-Density Parking For Autonomous Vehicles., Parag J. Siddique Aug 2021

High-Density Parking For Autonomous Vehicles., Parag J. Siddique

Electronic Theses and Dissertations

In a common parking lot, much of the space is devoted to lanes. Lanes must not be blocked for one simple reason: a blocked car might need to leave before the car that blocks it. However, the advent of autonomous vehicles gives us an opportunity to overcome this constraint, and to achieve a higher storage capacity of cars. Taking advantage of self-parking and intelligent communication systems of autonomous vehicles, we propose puzzle-based parking, a high-density design for a parking lot. We introduce a novel method of vehicle parking, which leads to maximum parking density. We then propose a heuristic method …


Identification Of Chemical Structures And Substructures Via Deep Q-Learning And Supervised Learning Of Ftir Spectra, Joshua D. Ellis Aug 2021

Identification Of Chemical Structures And Substructures Via Deep Q-Learning And Supervised Learning Of Ftir Spectra, Joshua D. Ellis

MSU Graduate Theses

Fourier-transform infrared (FTIR) spectra of organic compounds can be used to compare and identify compounds. A mid-FTIR spectrum gives absorbance values of a compound over the 400-4000 cm-1 range. Spectral matching is the process of comparing the spectral signature of two or more compounds and returning a value for the similarity of the compounds based on how closely their spectra match. This process is commonly used to identify an unknown compound by searching for its spectrum’s closes match in a database of known spectra. A major limitation of this process is that it can only be used to identify …


Step-Wise Deep Learning Models For Solving Routing Problems, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang Jul 2021

Step-Wise Deep Learning Models For Solving Routing Problems, Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang

Research Collection School Of Computing and Information Systems

Routing problems are very important in intelligent transportation systems. Recently, a number of deep learning-based methods are proposed to automatically learn construction heuristics for solving routing problems. However, these methods do not completely follow Bellman's Principle of Optimality since the visited nodes during construction are still included in the following subtasks, resulting in suboptimal policies. In this article, we propose a novel step-wise scheme which explicitly removes the visited nodes in each node selection step. We apply this scheme to two representative deep models for routing problems, pointer network and transformer attention model (TAM), and significantly improve the performance of …


Reinforcement Learning With Auxiliary Memory, Sterling Suggs Jun 2021

Reinforcement Learning With Auxiliary Memory, Sterling Suggs

Theses and Dissertations

Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.


A Reinforcement Learning Approach To Vehicle Path Optimization In Urban Environments, Shamsa Abdulla Al Hassani Jun 2021

A Reinforcement Learning Approach To Vehicle Path Optimization In Urban Environments, Shamsa Abdulla Al Hassani

Theses

Road traffic management in metropolitan cities and urban areas, in general, is an important component of Intelligent Transportation Systems (ITS). With the increasing number of world population and vehicles, a dramatic increase in road traffic is expected to put pressure on the transportation infrastructure. Therefore, there is a pressing need to devise new ways to optimize the traffic flow in order to accommodate the growing needs of transportation systems. This work proposes to use an Artificial Intelligent (AI) method based on reinforcement learning techniques for computing near-optimal vehicle itineraries applied to Vehicular Ad-hoc Networks (VANETs). These itineraries are optimized based …


Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau May 2021

Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

We address the problem ofmultiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains - air-traffic control and cooperative navigation, shows better solution quality than previous approaches.


Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar May 2021

Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar

Research Collection School Of Computing and Information Systems

We address the problem of multiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains—air-traffic control and cooperative navigation, shows better solution quality than previous approaches.


Learning Index Policies For Restless Bandits With Application To Maternal Healthcare, Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe May 2021

Learning Index Policies For Restless Bandits With Application To Maternal Healthcare, Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe

Research Collection School Of Computing and Information Systems

In many community health settings, it is crucial to have a systematic monitoring and intervention process to ensure that the patients adhere to healthcare programs, such as periodic health checks or taking medications. When these interventions are expensive, they can be provided to only a fixed small fraction of the patients at any period of time. Hence, it is important to carefully choose the beneficiaries who should be provided with interventions and when. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention …


Learning How To Search: Generating Effective Test Cases Through Adaptive Fitness Function Selection, Hussein Khalid Almulla Apr 2021

Learning How To Search: Generating Effective Test Cases Through Adaptive Fitness Function Selection, Hussein Khalid Almulla

Theses and Dissertations

Search-based test generation is guided by feedback from one or more fitness functions— scoring functions that judge solution optimality. Choosing informative fitness functions is crucial to meeting the goals of a tester. Unfortunately, many goals—such as forcing the class-under-test to throw exceptions, increasing test suite diversity, and attaining Strong Mutation Coverage—do not have effective fitness function formulations. We propose that meeting such goals requires treating fitness function identification as a secondary optimization step. An adaptive algorithm that can vary the selection of fitness functions could adjust its selection throughout the generation process to maximize goal attainment, based on the current …


Load Balancing And Resource Allocation In Smart Cities Using Reinforcement Learning, Aseel Alorbani Feb 2021

Load Balancing And Resource Allocation In Smart Cities Using Reinforcement Learning, Aseel Alorbani

Electronic Thesis and Dissertation Repository

Today, smart city technology is being adopted by many municipal governments to improve their services and to adapt to growing and changing urban population. Implementing a smart city application can be one of the most challenging projects due to the complexity, requirements and constraints. Sensing devices and computing components can be numerous and heterogeneous. Increasingly, researchers working in the smart city arena are looking to leverage edge and cloud computing to support smart city development. This approach also brings a number of challenges. Two of the main challenges are resource allocation and load balancing of tasks associated with processing data …


Increasing Software Reliability Using Mutation Testing And Machine Learning, Michael Allen Stewart Jan 2021

Increasing Software Reliability Using Mutation Testing And Machine Learning, Michael Allen Stewart

CCE Theses and Dissertations

Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. The goal of mutation testing was to reduce complex program errors by preventing the related simple errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs.

This …


Markov Decision Processes With Embedded Agents, Luke Harold Miles Jan 2021

Markov Decision Processes With Embedded Agents, Luke Harold Miles

Theses and Dissertations--Computer Science

We present Markov Decision Processes with Embedded Agents (MDPEAs), an extension of multi-agent POMDPs that allow for the modeling of environments that can change the actuators, sensors, and learning function of the agent, e.g., a household robot which could gain and lose hardware from its frame, or a sovereign software agent which could encounter viruses on computers that modify its code. We show several toy problems for which standard reinforcement-learning methods fail to converge, and give an algorithm, `just-copy-it`, which learns some of them. Unlike MDPs, MDPEAs are closed systems and hence their evolution over time can be treated as …


Q-Learning Based Routing Protocol For Congestion Avoidance, Daniel Godfrey, Beom Su Kim, Haoran Miao, Babar Shah, Bashir Hayat, Imran Khan, Tae Eung Sung, Ki Il Kim Jan 2021

Q-Learning Based Routing Protocol For Congestion Avoidance, Daniel Godfrey, Beom Su Kim, Haoran Miao, Babar Shah, Bashir Hayat, Imran Khan, Tae Eung Sung, Ki Il Kim

All Works

The end-to-end delay in a wired network is strongly dependent on congestion on intermediate nodes. Among lots of feasible approaches to avoid congestion efficiently, congestion-aware routing protocols tend to search for an uncongested path toward the destination through rule-based approaches in reactive/incident-driven and distributed methods. However, these previous approaches have a problem accommodating the changing network environments in autonomous and self-adaptive operations dynamically. To overcome this drawback, we present a new congestion-aware routing protocol based on a Q-learning algorithm in software-defined networks where logically centralized network operation enables intelligent control and management of network resources. In a proposed routing protocol, …


Relational-Grid-World: A Novel Relational Reasoning Environment And An Agentmodel For Relational Information Extraction, Faruk Küçüksubaşi, Eli̇f Sürer Jan 2021

Relational-Grid-World: A Novel Relational Reasoning Environment And An Agentmodel For Relational Information Extraction, Faruk Küçüksubaşi, Eli̇f Sürer

Turkish Journal of Electrical Engineering and Computer Sciences

Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generallyhave uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms ofgeneralizability and interpretability using symbolic artificial intelligence (AI) tools such as logic programming. Inthis study, we present a model-free RL architecture that is supported with explicit relational representations of theenvironmental objects. For the first time, we use the PrediNet network architecture in a dynamic decision-making problemrather than image-based tasks, and multi-head dot-product attention network (MHDPA) as a baseline for performancecomparisons. We tested two networks in two environments -i.e., the baseline box-world environment and …


Multiagent Q-Learning Based Uav Trajectory Planning For Effective Situationalawareness, Erdal Akin, Kubi̇lay Demi̇r, Hali̇l Yetgi̇n Jan 2021

Multiagent Q-Learning Based Uav Trajectory Planning For Effective Situationalawareness, Erdal Akin, Kubi̇lay Demi̇r, Hali̇l Yetgi̇n

Turkish Journal of Electrical Engineering and Computer Sciences

In the event of a natural disaster, arrival time of the search and rescue (SAR) teams to the affected areas is of vital importance to save the life of the victims. In particular, when an earthquake occurs in a geographically large area, reconnaissance of the debris within a short-time is critical for conducting successful SAR missions. An effective and quick situational awareness in postdisaster scenarios can be provided via the help of unmanned aerial vehicles (UAVs). However, off-the-shelf UAVs suffer from the limited communication range as well as the limited airborne duration due to battery constraints. If telecommunication infrastructure is …


Playing Pong Using Q-Learning, Akash Kumar Jan 2021

Playing Pong Using Q-Learning, Akash Kumar

West Chester University Master’s Theses

This thesis involves the use of a reinforcement learning algorithm (RL) called Q-learning to train a Q-agent to play a game of Pong against a near-perfect opponent. Compared to previously related work which trained Pong RL agents by combining Q-learning with deep learning in an algorithm known as Deep Q-Networks, the work presented in this paper takes advantage of known environment constraints of the custom-made Pong environment to train the agent using one-step Q-learning alone. In addition, the thesis explores ways of making the Q-learning more efficient by converting Markov Decision Processes (MDPs) to Partially Observable Markov Decision Processes (POMDPs), …


Deep Q-Network-Based Noise Suppression For Robust Speech Recognition, Tae-Jun Park, Joon-Hyuk Chang Jan 2021

Deep Q-Network-Based Noise Suppression For Robust Speech Recognition, Tae-Jun Park, Joon-Hyuk Chang

Turkish Journal of Electrical Engineering and Computer Sciences

This study develops the deep Q-network (DQN)-based noise suppression for robust speech recognition purposes under ambient noise. We thus design a reinforcement algorithm that combines DQN training with a deep neural networks (DNN) to let reinforcement learning (RL) work for complex and high dimensional environments like speech recognition. For this, we elaborate on the DQN training to choose the best action that is the quantized noise suppression gain by the observation of noisy speech signal with the rewards of DQN including both the word error rate (WER) and objective speech quality measure. Experiments demonstrate that the proposed algorithm improves speech …