Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2022

Reinforcement learning

Discipline
Institution
Publication
Publication Type

Articles 1 - 22 of 22

Full-Text Articles in Physical Sciences and Mathematics

End-To-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek Dec 2022

End-To-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek

Research Collection School Of Computing and Information Systems

Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated …


Reinforcement-Learning-Based Adaptive Tracking Control For A Space Continuum Robot Based On Reinforcement Learning, Da Jiang, Zhiqin Cai, Zhongzhen Liu, Haijun Peng, Zhigang Wu Oct 2022

Reinforcement-Learning-Based Adaptive Tracking Control For A Space Continuum Robot Based On Reinforcement Learning, Da Jiang, Zhiqin Cai, Zhongzhen Liu, Haijun Peng, Zhigang Wu

Journal of System Simulation

Abstract: Aiming at the tracking control for three-arm space continuum robot in space active debris removal manipulation, an adaptive sliding mode control algorithm based on deep reinforcement learning is proposed. Through BP network, a data-driven dynamic model is developed as the predictive model to guide the reinforcement learning to adjust the sliding mode controller's parameters online, and finally realize a real-time tracking control. Simulation results show that the proposed data-driven predictive model can accurately predict the robot's dynamic characteristics with the relative error within ±1% to random trajectories. Compared with the fixed-parameter sliding mode controller, the proposed adaptive controller …


Interactive Video Corpus Moment Retrieval Using Reinforcement Learning, Zhixin Ma, Chong-Wah Ngo Oct 2022

Interactive Video Corpus Moment Retrieval Using Reinforcement Learning, Zhixin Ma, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Known-item video search is effective with human-in-the-loop to interactively investigate the search result and refine the initial query. Nevertheless, when the first few pages of results are swamped with visually similar items, or the search target is hidden deep in the ranked list, finding the know-item target usually requires a long duration of browsing and result inspection. This paper tackles the problem by reinforcement learning, aiming to reach a search target within a few rounds of interaction by long-term learning from user feedbacks. Specifically, the system interactively plans for navigation path based on feedback and recommends a potential target that …


Learning To Play An Imperfect Information Card Game Using Reinforcement Learning, Buğra Kaan Demi̇rdöver, Ömer Baykal, Ferdanur Alpaslan Sep 2022

Learning To Play An Imperfect Information Card Game Using Reinforcement Learning, Buğra Kaan Demi̇rdöver, Ömer Baykal, Ferdanur Alpaslan

Turkish Journal of Electrical Engineering and Computer Sciences

Artificial intelligence and machine learning are widely popular in many areas. One of the most popular ones is gaming. Games are perfect testbeds for machine learning and artificial intelligence with various scenarios and types. This study aims to develop a self-learning intelligent agent to play the Hearts game. Hearts is one of the most popular trick-taking card games around the world. It is an imperfect information card game. In addition to having a huge state space, Hearts offers many extra challenges due to its nature. In order to ease the development process, the agent developed in the scope of this …


Low-Reynolds-Number Locomotion Via Reinforcement Learning, Yuexin Liu Aug 2022

Low-Reynolds-Number Locomotion Via Reinforcement Learning, Yuexin Liu

Dissertations

This dissertation summarizes computational results from applying reinforcement learning and deep neural network to the designs of artificial microswimmers in the inertialess regime, where the viscous dissipation in the surrounding fluid environment dominates and the swimmer’s inertia is completely negligible. In particular, works in this dissertation consist of four interrelated studies of the design of microswimmers for different tasks: (1) a one-dimensional microswimmer in free-space that moves towards the target via translation, (2) a one-dimensional microswimmer in a periodic domain that rotates to reach the target, (3) a two-dimensional microswimmer that switches gaits to navigate to the designated targets in …


Fdrl Approach For Association And Resource Allocation In Multi-Uav Air-To-Ground Iomt Network, Abegaz Mohammed, Aiman Erbad, Hayla Nahom, Abdullatif Albaseer, Mohammed Abdallah, Mohsen Guizani Aug 2022

Fdrl Approach For Association And Resource Allocation In Multi-Uav Air-To-Ground Iomt Network, Abegaz Mohammed, Aiman Erbad, Hayla Nahom, Abdullatif Albaseer, Mohammed Abdallah, Mohsen Guizani

Machine Learning Faculty Publications

In 6G networks, unmanned aerial vehicles (UAVs) can serve as aerial flying base stations (AFBS) with aerial mobile edge computing (AMEC) server capabilities. AFBS is an increasingly popular solution for delivering time-sensitive applications, extending network coverage, and assisting ground base stations in the healthcare systems for remote areas with limited infrastructure. Furthermore, the UAVs are deployed in the healthcare system to support the Internet of medical things (IoMT) devices in data collection, medical equipment distribution, and providing smart services. However, ensuring the privacy and security of patients’ data with the limited UAV resources is a major challenge. In this paper, …


An Adaptive Multi-Level Quantization-Based Reinforcement Learning Model For Enhancing Uav Landing On Moving Targets, Najmaddin Abo Mosali, Syariful Syafiq Shamsudin, Salama A. Mostafa, Omar Alfandi, Rosli Omar, Najib Al-Fadhali, Mazin Abed Mohammed, R. Q. Malik, Mustafa Musa Jaber, Abdu Saif Jul 2022

An Adaptive Multi-Level Quantization-Based Reinforcement Learning Model For Enhancing Uav Landing On Moving Targets, Najmaddin Abo Mosali, Syariful Syafiq Shamsudin, Salama A. Mostafa, Omar Alfandi, Rosli Omar, Najib Al-Fadhali, Mazin Abed Mohammed, R. Q. Malik, Mustafa Musa Jaber, Abdu Saif

All Works

The autonomous landing of an unmanned aerial vehicle (UAV) on a moving platform is an essential functionality in various UAV-based applications. It can be added to a teleoperation UAV system or part of an autonomous UAV control system. Various robust and predictive control systems based on the traditional control theory are used for operating a UAV. Recently, some attempts were made to land a UAV on a moving target using reinforcement learning (RL). Vision is used as a typical way of sensing and detecting the moving target. Mainly, the related works have deployed a deep-neural network (DNN) for RL, which …


Sdq: Stochastic Differentiable Quantization With Mixed Precision, Xijie Huang, Zhiqiang Shen, Shichao Li, Zechun Liu, Xianghong Hu, Jeffry Wicaksana, Eric Xing, Kwang Ting Cheng Jul 2022

Sdq: Stochastic Differentiable Quantization With Mixed Precision, Xijie Huang, Zhiqiang Shen, Shichao Li, Zechun Liu, Xianghong Hu, Jeffry Wicaksana, Eric Xing, Kwang Ting Cheng

Machine Learning Faculty Publications

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed precision quantization (MPQ) begins to fully leverage the capacity of representation by searching optimized bitwidths for different layers and modules in a network. However, previous studies mainly search the MPQ strategy in a costly scheme using reinforcement learning, neural architecture search, etc., or simply utilize partial prior knowledge for bitwidth assignment, which might be biased on locality of information and is sub-optimal. In this work, we present …


Application Of Improved Q Learning Algorithm In Job Shop Scheduling Problem, Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian Jun 2022

Application Of Improved Q Learning Algorithm In Job Shop Scheduling Problem, Yejian Zhao, Yanhong Wang, Jun Zhang, Hongxia Yu, Zhongda Tian

Journal of System Simulation

Abstract: Aiming at the job shop scheduling in a dynamic environment, a dynamic scheduling algorithm based on an improved Q learning algorithm and dispatching rules is proposed. The state space of the dynamic scheduling algorithm is described with the concept of "the urgency of remaining tasks" and a reward function with the purpose of "the higher the slack, the higher the penalty" is disigned. In view of the problem that the greedy strategy will select the sub-optimal actions in the later stage of learning, the traditional Q learning algorithm is improved by introducing an action selection strategy based on the …


Learning To Generalize Dispatching Rules On The Job Shop Scheduling, Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal, Martin Takac Jun 2022

Learning To Generalize Dispatching Rules On The Job Shop Scheduling, Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal, Martin Takac

Machine Learning Faculty Publications

This paper introduces a Reinforcement Learning approach to better generalize heuristic dispatching rules on the Job-shop Scheduling Problem (JSP). Current models on the JSP do not focus on generalization, although, as we show in this work, this is key to learning better heuristics on the problem. A well-known technique to improve generalization is to learn on increasingly complex instances using Curriculum Learning (CL). However, as many works in the literature indicate, this technique might suffer from catastrophic forgetting when transferring the learned skills between different problem sizes. To address this issue, we introduce a novel Adversarial Curriculum Learning (ACL) strategy, …


Offline Reinforcement Learning With Causal Structured World Models, Zheng-Mao Zhu, Xiong-Hui Chen, Hong-Long Tian, Kun Zhang, Yang Yu Jun 2022

Offline Reinforcement Learning With Causal Structured World Models, Zheng-Mao Zhu, Xiong-Hui Chen, Hong-Long Tian, Kun Zhang, Yang Yu

Machine Learning Faculty Publications

Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models to map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error …


Reinforcement Learning-Based Interactive Video Search, Zhixin Ma, Jiaxin Wu, Zhijian Hou, Chong-Wah Ngo Jun 2022

Reinforcement Learning-Based Interactive Video Search, Zhixin Ma, Jiaxin Wu, Zhijian Hou, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Despite the rapid progress in text-to-video search due to the advancement of cross-modal representation learning, the existing techniques still fall short in helping users to rapidly identify the search targets. Particularly, in the situation that a system suggests a long list of similar candidates, the user needs to painstakingly inspect every search result. The experience is frustrated with repeated watching of similar clips, and more frustratingly, the search targets may be overlooked due to mental tiredness. This paper explores reinforcement learning-based (RL) searching to relieve the user from the burden of brute force inspection. Specifically, the system maintains a graph …


Pervasive Machine Learning For Smart Radio Environments Enabled By Reconfigurable Intelligent Surfaces, George C. Alexandropoulos, Kyriakos Stylianopoulos, Chongwen Huang, Chau Yuen, Mehdi Bennis, Mérouane Debbah May 2022

Pervasive Machine Learning For Smart Radio Environments Enabled By Reconfigurable Intelligent Surfaces, George C. Alexandropoulos, Kyriakos Stylianopoulos, Chongwen Huang, Chau Yuen, Mehdi Bennis, Mérouane Debbah

Machine Learning Faculty Publications

The emerging technology of Reconfigurable Intelligent Surfaces (RISs) is provisioned as an enabler of smart wireless environments, offering a highly scalable, low-cost, hardware-efficient, and almost energy-neutral solution for dynamic control of the propagation of electromagnetic signals over the wireless medium, ultimately providing increased environmental intelligence for diverse operation objectives. One of the major challenges with the envisioned dense deployment of RISs in such reconfigurable radio environments is the efficient configuration of multiple metasurfaces with limited, or even the absence of, computing hardware. In this paper, we consider multi-user and multi-RIS-empowered wireless systems, and present a thorough survey of the online …


Accelerating Serverless Computing By Harvesting Idle Resources, Hanfei Yu, Hao Wang, Jian Li, Xu Yuan, Seung Jong Park Apr 2022

Accelerating Serverless Computing By Harvesting Idle Resources, Hanfei Yu, Hao Wang, Jian Li, Xu Yuan, Seung Jong Park

Computer Science Faculty Research & Creative Works

Serverless computing automates fine-grained resource scaling and simplifies the development and deployment of online services with stateless functions. However, it is still non-trivial for users to allocate appropriate resources due to various function types, dependencies, and input sizes. Misconfiguration of resource allocations leaves functions either under-provisioned or over-provisioned and leads to continuous low resource utilization. This paper presents Freyr, a new resource manager (RM) for serverless platforms that maximizes resource efficiency by dynamically harvesting idle resources from over-provisioned functions to under-provisioned functions. Freyr monitors each function's resource utilization in real-time, detects over-provisioning and under-provisioning, and learns to harvest idle resources …


Research On The Construction Method Of Simulation Evaluation Index Of Operation Effectiveness Operation Concept Traction, Ziwei Zhang, Liang Li, Zhiming Dong, Yifei Wang, Li Duan Mar 2022

Research On The Construction Method Of Simulation Evaluation Index Of Operation Effectiveness Operation Concept Traction, Ziwei Zhang, Liang Li, Zhiming Dong, Yifei Wang, Li Duan

Journal of System Simulation

Abstract: Agents are difficult to be directly modeled and simulated due to the complexity of their own interaction and learning behaviors. Aiming at the common problems in the discrete simulation of the agent, the event transfer mechanism of the discrete event system specification (DEVS) atomic model is applied to express the interaction and learning of an agent. Through the interaction mode of the agent, the transfer control of multi-state external events, the port connection mode, as well as the introduction of reinforcement learning event transfer representation, a discrete simulation construction method of the agent based on the DEVS atomic model …


The Impact Of Dynamic Difficulty Adjustment On Player Experience In Video Games, Chineng Vang Mar 2022

The Impact Of Dynamic Difficulty Adjustment On Player Experience In Video Games, Chineng Vang

Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal

Dynamic Difficulty Adjustment (DDA) is a process by which a video game adjusts its level of challenge to match a player’s skill level. Its popularity in the video game industry continues to grow as it has the ability to keep players continuously engaged in a game, a concept referred to as Flow. However, the influence of DDA on games has received mixed responses, specifically that it can enhance player experience as well as hinder it. This paper explores DDA through the Monte Carlo Tree Search algorithm and Reinforcement Learning, gathering feedback from players seeking to understand what about DDA is …


Heterogeneous Attentions For Solving Pickup And Delivery Problem Via Deep Reinforcement Learning, Jingwen Li, Liang Xin, Zhiguang Cao, Andrew Lim, Wen Song, Jie Zhang Mar 2022

Heterogeneous Attentions For Solving Pickup And Delivery Problem Via Deep Reinforcement Learning, Jingwen Li, Liang Xin, Zhiguang Cao, Andrew Lim, Wen Song, Jie Zhang

Research Collection School Of Computing and Information Systems

Recently, there is an emerging trend to apply deep reinforcement learning to solve the vehicle routing problem (VRP), where a learnt policy governs the selection of next node for visiting. However, existing methods could not handle well the pairing and precedence relationships in the pickup and delivery problem (PDP), which is a representative variant of VRP. To address this challenging issue, we leverage a novel neural network integrated with a heterogeneous attention mechanism to empower the policy in deep reinforcement learning to automatically select the nodes. In particular, the heterogeneous attention mechanism specifically prescribes attentions for each role of the …


Team Air Combat Using Model-Based Reinforcement Learning, David A. Mottice Mar 2022

Team Air Combat Using Model-Based Reinforcement Learning, David A. Mottice

Theses and Dissertations

We formulate the first generalized air combat maneuvering problem (ACMP), called the MvN ACMP, wherein M friendly AUCAVs engage against N enemy AUCAVs, developing a Markov decision process (MDP) model to control the team of M Blue AUCAVs. The MDP model leverages a 5-degree-of-freedom aircraft state transition model and formulates a directed energy weapon capability. Instead, a model-based reinforcement learning approach is adopted wherein an approximate policy iteration algorithmic strategy is implemented to attain high-quality approximate policies relative to a high performing benchmark policy. The ADP algorithm utilizes a multi-layer neural network for the value function approximation regression mechanism. One-versus-one …


Deep Reinforcement Learning For Open Multiagent System, Tianxing Zhu Jan 2022

Deep Reinforcement Learning For Open Multiagent System, Tianxing Zhu

Honors Papers

In open multiagent systems, multiple agents work together or compete to reach the goal while members of the group change over time. For example, intelligent robots that are collaborating to put out wildfires may run out of suppressants and have to leave the place to recharge; the rest of the robots may need to change their behaviors accordingly to better control the fires. Thus, openness requires agents not only to predict the behaviors of others, but also the presence of other agents. We present a deep reinforcement learning method that adapts the proximal policy optimization algorithm to learn the optimal …


Multi-Step Prediction Using Tree Generation For Reinforcement Learning, Kevin Prakash Jan 2022

Multi-Step Prediction Using Tree Generation For Reinforcement Learning, Kevin Prakash

Master's Projects

The goal of reinforcement learning is to learn a policy that maximizes a reward function. In some environments with complete information, search algorithms are highly useful in simulating action sequences in a game tree. However, in many practical environments, such effective search strategies are not applicable since their state transition information may not be available. This paper proposes a novel method to approximate a game tree that enables reinforcement learning to use search strategies even in incomplete information environments. With an approximated game tree, the agent predicts all possible states multiple steps into the future and evaluates the states to …


Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol Jan 2022

Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol

Master's Projects

The first web applications appeared in the early nineteen nineties. These applica- tions were entirely hosted in house by companies that developed them. In the mid 2000s the concept of a digital cloud was introduced by the then CEO of google Eric Schmidt. Now in the current day most companies will at least partially host their applications on proprietary servers hosted at data-centers or commercial clouds like Amazon Web Services (AWS) or Heroku.

This arrangement seems like a straight forward win-win for both parties, the customer gets rid of the hassle of maintaining a live server for their applications and …


Towards An Active Foveated Approach To Computer Vision, Dario Dematties, Silvio Rizzi, George K. Thiruvathukal, Alejandro Javier Wainselboim Jan 2022

Towards An Active Foveated Approach To Computer Vision, Dario Dematties, Silvio Rizzi, George K. Thiruvathukal, Alejandro Javier Wainselboim

Computer Science: Faculty Publications and Other Works

In this paper, a series of experimental methods are presented explaining a new approach towards active foveated Computer Vision (CV). This is a collaborative effort between researchers at CONICET Mendoza Technological Scientific Center from Argentina, Argonne National Laboratory (ANL), and Loyola University Chicago from the US. The aim is to advance new CV approaches more in line with those found in biological agents in order to bring novel solutions to the main problems faced by current CV applications. Basically this work enhance Self-supervised (SS) learning, incorporating foveated vision plus saccadic behavior in order to improve training and computational efficiency without …