Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2019

Reinforcement learning

Discipline
Institution
Publication
Publication Type

Articles 1 - 18 of 18

Full-Text Articles in Engineering

Robot Arm Control Method Based On Deep Reinforcement Learning, Heyu Li, Zhilong Zhao, Gu Lei, Liqin Guo, Zeng Bi, Tingyu Lin Dec 2019

Robot Arm Control Method Based On Deep Reinforcement Learning, Heyu Li, Zhilong Zhao, Gu Lei, Liqin Guo, Zeng Bi, Tingyu Lin

Journal of System Simulation

Abstract: Deep reinforcement learning continues to explore in the environment and adjusts the neural network parameters by the reward function. The actual production line can not be used as the trial and error environment for the algorithm, so there is not enough data. For that, this paper constructs a virtual robot arm simulation environment, including the robot arm and the object. The Deep Deterministic Policy Gradient (DDPG),in which the state variables and reward function are set,is trained by deep reinforcement learning algorithm in the simulation environment to realize the target of controlling the robot arm to move the gripper below …


Domain Adaptation In Unmanned Aerial Vehicles Landing Using Reinforcement Learning, Pedro Lucas Franca Albuquerque Dec 2019

Domain Adaptation In Unmanned Aerial Vehicles Landing Using Reinforcement Learning, Pedro Lucas Franca Albuquerque

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Landing an unmanned aerial vehicle (UAV) on a moving platform is a challenging task that often requires exact models of the UAV dynamics, platform characteristics, and environmental conditions. In this thesis, we present and investigate three different machine learning approaches with varying levels of domain knowledge: dynamics randomization, universal policy with system identification, and reinforcement learning with no parameter variation. We first train the policies in simulation, then perform experiments both in simulation, making variations of the system dynamics with wind and friction coefficient, then perform experiments in a real robot system with wind variation. We initially expected that providing …


A Reinforcement Learning Approach To Spacecraft Trajectory Optimization, Daniel S. Kolosa Dec 2019

A Reinforcement Learning Approach To Spacecraft Trajectory Optimization, Daniel S. Kolosa

Dissertations

This dissertation explores a novel method of solving low-thrust spacecraft targeting problems using reinforcement learning. A reinforcement learning algorithm based on Deep Deterministic Policy Gradients was developed to solve low-thrust trajectory optimization problems. The algorithm consists of two neural networks, an actor network and a critic network. The actor approximates a thrust magnitude given the current spacecraft state expressed as a set of orbital elements. The critic network evaluates the action taken by the actor based on the state and action taken. Three different types of trajectory problems were solved, a generalized orbit change maneuver, a semimajor axis change maneuver, …


A Comparison Of Contextual Bandit Approaches To Human-In-The-Loop Robot Task Completion With Infrequent Feedback, Matt Mcneill, Damian Lyons Nov 2019

A Comparison Of Contextual Bandit Approaches To Human-In-The-Loop Robot Task Completion With Infrequent Feedback, Matt Mcneill, Damian Lyons

Faculty Publications

Artificially intelligent assistive agents are playing an increased role in our work and homes. In contrast with currently predominant conversational agents, whose intelligence derives from dialogue trees and external modules, a fully autonomous domestic or workplace robot must carry out more complex reasoning. Such a robot must make good decisions as soon as possible, learn from experience, respond to feedback, and rely on feedback only as much as necessary. In this research, we narrow the focus of a hypothetical robot assistant to a room tidying task in a simulated domestic environment. Given an item, the robot chooses where to put …


An Application Of Sliding Mode Control To Model-Based Reinforcement Learning, Aaron Thomas Parisi Sep 2019

An Application Of Sliding Mode Control To Model-Based Reinforcement Learning, Aaron Thomas Parisi

Master's Theses

The state-of-art model-free reinforcement learning algorithms can generate admissible controls for complicated systems with no prior knowledge of the system dynamics, so long as sufficient (oftentimes millions) of samples are available from the environ- ment. On the other hand, model-based reinforcement learning approaches seek to leverage known optimal or robust control to reinforcement learning tasks by mod- elling the system dynamics and applying well established control algorithms to the system model. Sliding-mode controllers are robust to system disturbance and modelling errors, and have been widely used for high-order nonlinear system control. This thesis studies the application of sliding mode control …


Docking Control Of An Autonomous Underwater Vehicle Using Reinforcement Learning, Enrico Anderlini, Gordon Parker, Giles Thomas Aug 2019

Docking Control Of An Autonomous Underwater Vehicle Using Reinforcement Learning, Enrico Anderlini, Gordon Parker, Giles Thomas

Michigan Tech Publications

To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) will need to autonomously dock onto a charging station. Here, reinforcement learning strategies were applied for the first time to control the docking of an AUV onto a fixed platform in a simulation environment. Two reinforcement learning schemes were investigated: one with continuous state and action spaces, deep deterministic policy gradient (DDPG), and one with continuous state but discrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as step changes in the control input signals. The performance of the reinforcement learning strategies was compared …


Reinforcement Learning For Self Organization And Power Control Of Two-Tier Heterogeneous Networks, Roohollah Amiri, Mojtaba Ahmadi Almasi, Jeffrey G. Andrews, Hani Mehrpouyan Aug 2019

Reinforcement Learning For Self Organization And Power Control Of Two-Tier Heterogeneous Networks, Roohollah Amiri, Mojtaba Ahmadi Almasi, Jeffrey G. Andrews, Hani Mehrpouyan

Electrical and Computer Engineering Faculty Publications and Presentations

Self-organizing networks (SONs) can help manage the severe interference in dense heterogeneous networks (HetNets). Given their need to automatically configure power and other settings, machine learning is a promising tool for data-driven decision making in SONs. In this paper, a HetNet is modeled as a dense two-tier network with conventional macrocells overlaid with denser small cells (e.g. femto or pico cells). First, a distributed framework based on multi-agent Markov decision process is proposed that models the power optimization problem in the network. Second, we present a systematic approach for designing a reward function based on the optimization problem. Third, we …


Joint Manufacturing And Onsite Microgrid System Control Using Markov Decision Process And Neural Network Integrated Reinforcement Learning, Wenqing Hu, Zeyi Sun, Y. Zhang, Y. Li Aug 2019

Joint Manufacturing And Onsite Microgrid System Control Using Markov Decision Process And Neural Network Integrated Reinforcement Learning, Wenqing Hu, Zeyi Sun, Y. Zhang, Y. Li

Mathematics and Statistics Faculty Research & Creative Works

Onsite microgrid generation systems with renewable sources are considered a promising complementary energy supply system for manufacturing plant, especially when outage occurs during which the energy supplied from the grid is not available. Compared to the widely recognized benefits in terms of the resilience improvement when it is used as a backup energy system, the operation along with the electricity grid to support the manufacturing operations in non-emergent mode has been less investigated. In this paper, we propose a joint dynamic decision-making model for the optimal control for both manufacturing system and onsite generation system. Markov Decision Process (MDP) is …


Data-Driven Integral Reinforcement Learning For Continuous-Time Non-Zero-Sum Games, Yongliang Yang, Liming Wang, Hamidreza Modares, Dawei Ding, Yixin Yin, Donald C. Wunsch Jun 2019

Data-Driven Integral Reinforcement Learning For Continuous-Time Non-Zero-Sum Games, Yongliang Yang, Liming Wang, Hamidreza Modares, Dawei Ding, Yixin Yin, Donald C. Wunsch

Electrical and Computer Engineering Faculty Research & Creative Works

This paper develops an integral value iteration (VI) method to efficiently find online the Nash equilibrium solution of two-player non-zero-sum (NZS) differential games for linear systems with partially unknown dynamics. To guarantee the closed-loop stability about the Nash equilibrium, the explicit upper bound for the discounted factor is given. To show the efficacy of the presented online model-free solution, the integral VI method is compared with the model-based off-line policy iteration method. Moreover, the theoretical analysis of the integral VI algorithm in terms of three aspects, i.e., positive definiteness properties of the updated cost functions, the stability of the closed-loop …


Robot Navigation In Cluttered Environments With Deep Reinforcement Learning, Ryan Weideman Jun 2019

Robot Navigation In Cluttered Environments With Deep Reinforcement Learning, Ryan Weideman

Master's Theses

The application of robotics in cluttered and dynamic environments provides a wealth of challenges. This thesis proposes a deep reinforcement learning based system that determines collision free navigation robot velocities directly from a sequence of depth images and a desired direction of travel. The system is designed such that a real robot could be placed in an unmapped, cluttered environment and be able to navigate in a desired direction with no prior knowledge. Deep Q-learning, coupled with the innovations of double Q-learning and dueling Q-networks, is applied. Two modifications of this architecture are presented to incorporate direction heading information that …


Viewpoint Optimization For Autonomous Strawberry Harvesting With Deep Reinforcement Learning, Jonathon J. Sather Jun 2019

Viewpoint Optimization For Autonomous Strawberry Harvesting With Deep Reinforcement Learning, Jonathon J. Sather

Master's Theses

Autonomous harvesting may provide a viable solution to mounting labor pressures in the United States' strawberry industry. However, due to bottlenecks in machine perception and economic viability, a profitable and commercially adopted strawberry harvesting system remains elusive. In this research, we explore the feasibility of using deep reinforcement learning to overcome these bottlenecks and develop a practical algorithm to address the sub-objective of viewpoint optimization, or the development of a control policy to direct a camera to favorable vantage points for autonomous harvesting. We evaluate the algorithm's performance in a custom, open-source simulated environment and observe affirmative results. Our trained …


Dp-Q(Λ): Real-Time Path Planning For Multi-Agent In Large-Scale Web3d Scene, Fengting Yan, Jinyuan Jia Apr 2019

Dp-Q(Λ): Real-Time Path Planning For Multi-Agent In Large-Scale Web3d Scene, Fengting Yan, Jinyuan Jia

Journal of System Simulation

Abstract: The path planning of multi-agent in an unknown large-scale scene needs an efficient and stable algorithm, and needs to solve multi-agent collision avoidance problem, and then completes a real-time path planning in Web3D. To solve above problems, the DP-Q(λ) algorithm is proposed; and the direction constraints, high reward or punishment weight training methods are used to adjust the values of reward or punishment by using a probability p (0-1 random number). The value from reward or punishment determines its next step path planning strategy. If the next position is free, the agent could walk to it. The above strategy …


A Deep Recurrent Q Network Towards Self-Adapting Distributed Microservices Architecture (In Press), Basel Magableh Jan 2019

A Deep Recurrent Q Network Towards Self-Adapting Distributed Microservices Architecture (In Press), Basel Magableh

Articles

One desired aspect of microservices architecture is the ability to self-adapt its own architecture and behaviour in response to changes in the operational environment. To achieve the desired high levels of self-adaptability, this research implements the distributed microservices architectures model, as informed by the MAPE-K model. The proposed architecture employs a multi adaptation agents supported by a centralised controller, that can observe the environment and execute a suitable adaptation action. The adaptation planning is managed by a deep recurrent Q-network (DRQN). It is argued that such integration between DRQN and MDP agents in a MAPE-K model offers distributed microservice architecture …


Deep Q Learning For Self Adaptive Distributed Microservices Architecture (In Press), Basel Magableh Jan 2019

Deep Q Learning For Self Adaptive Distributed Microservices Architecture (In Press), Basel Magableh

Articles

One desired aspect of a self-adapting microservices architecture is the ability to continuously monitor the operational environment, detect and observe anomalous behavior, and provide a reasonable policy for self-scaling, self-healing, and self-tuning the computational resources in order to dynamically respond to a sudden change in its operational environment. The behaviour of a microservices architecture is continuously changing overtime, which makes it a challenging task to use a statistical model to identify both the normal and abnormal behaviour of the services running. The performance of the microservices cluster could fluctuate around the demand to accommodate scalability, orchestration and load balancing demands. …


A Graph-Based Reinforcement Learning Method With Converged State Exploration And Exploitation, Han Li, Tianding Chen, Hualiang Teng, Yingtao Jiang Jan 2019

A Graph-Based Reinforcement Learning Method With Converged State Exploration And Exploitation, Han Li, Tianding Chen, Hualiang Teng, Yingtao Jiang

Civil and Environmental Engineering and Construction Faculty Research

In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all …


Optimization Of Energy Harvesting Mobile Nodes Within Scalable Converter System Based On Reinforcement Learning, Chengtao Xu Jan 2019

Optimization Of Energy Harvesting Mobile Nodes Within Scalable Converter System Based On Reinforcement Learning, Chengtao Xu

All Graduate Theses, Dissertations, and Other Capstone Projects

Microgrid monitoring focusing on power data, such as voltage and current, has become more significant in the development of decentralized power supply system. The power data transmission delay between distributed generator is vital for evaluating the stability and financial outcome of overall grid performance. In this thesis, both hardware and simulation has been discussed for optimizing the data packets transmission delay, energy consumption, and collision rate. To minimize the transmission delay and collision rate, state-action-reward-state-action (SARSA) and Q-learning method based on Markov decision process (MDP) model is used to search the most efficient data transmission scheme for each agent device. …


Assessment Of Adaptability Of A Supply Chain Trading Agent’S Strategy: Evolutionary Game Theory Approach, Yoon Sang Lee, Riyaz T. Sikora Jan 2019

Assessment Of Adaptability Of A Supply Chain Trading Agent’S Strategy: Evolutionary Game Theory Approach, Yoon Sang Lee, Riyaz T. Sikora

Journal of International Technology and Information Management

With the increase in the complexity of supply chain management, the use of intelligent agents for automated trading has gained popularity (Collins, Arunachalam, B, et al. 2006). The performance of supply-chain agents depends on not just the market environment (supply and demand patterns) but also on what types of other agents they are competing with. For designers of such agents it is important to ascertain that their agents are robust and can adapt to changing market and competitive environments. However, to date there has not been any work done that assesses the adaptability of a trading agent’s strategy in the …


Less Is More: Beating The Market With Recurrent Reinforcement Learning, Louis Kurt Bernhard Steinmeister Jan 2019

Less Is More: Beating The Market With Recurrent Reinforcement Learning, Louis Kurt Bernhard Steinmeister

Masters Theses

"Multiple recurrent reinforcement learners were implemented to make trading decisions based on real and freely available macro-economic data. The learning algorithm and different reinforcement functions (the Differential Sharpe Ratio, Differential Downside Deviation Ratio and Returns) were revised and the performances were compared while transaction costs were taken into account. (This is important for practical implementations even though many publications ignore this consideration.) It was assumed that the traders make long-short decisions in the S&P500 with complementary 3-month treasury bill investments. Leveraged positions in the S&P500 were disallowed. Notably, the Differential Sharpe Ratio and the Differential Downside Deviation Ratio are risk …