Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Reinforcement learning

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 181 - 189 of 189

Full-Text Articles in Physical Sciences and Mathematics

Reinforcement Learning-Based Output Feedback Control Of Nonlinear Systems With Input Constraints, Pingan He, Jagannathan Sarangapani Feb 2005

Reinforcement Learning-Based Output Feedback Control Of Nonlinear Systems With Input Constraints, Pingan He, Jagannathan Sarangapani

Electrical and Computer Engineering Faculty Research & Creative Works

A novel neural network (NN) -based output feedback controller with magnitude constraints is designed to deliver a desired tracking performance for a class of multi-input-multi-output (MIMO) discrete-time strict feedback nonlinear systems. Reinforcement learning in discrete time is proposed for the output feedback controller, which uses three NN: 1) a NN observer to estimate the system states with the input-output data; 2) a critic NN to approximate certain strategic utility function; and 3) an action NN to minimize both the strategic utility function and the unknown dynamics estimation errors. The magnitude constraints are manifested as saturation nonlinearities in the output feedback …


Variable Resolution Discretization In The Joint Space, Christopher K. Monson, Kevin Seppi, David Wingate, Todd S. Peterson Dec 2004

Variable Resolution Discretization In The Joint Space, Christopher K. Monson, Kevin Seppi, David Wingate, Todd S. Peterson

Faculty Publications

We present JoSTLe, an algorithm that performs value iteration on control problems with continuous actions, allowing this useful reinforcement learning technique to be applied to problems where a priori action discretization is inadequate. The algorithm is an extension of a variable resolution technique that works for problems with continuous states and discrete actions. Results are given that indicate that JoSTLe is a promising step toward reinforcement learning in a fully continuous domain.


Incremental Policy Learning: An Equilibrium Selection Algorithm For Reinforcement Learning Agents With Common Interests, Nancy Fulda, Dan A. Ventura Jul 2004

Incremental Policy Learning: An Equilibrium Selection Algorithm For Reinforcement Learning Agents With Common Interests, Nancy Fulda, Dan A. Ventura

Faculty Publications

We present an equilibrium selection algorithm for reinforcement learning agents that incrementally adjusts the probability of executing each action based on the desirability of the outcome obtained in the last time step. The algorithm assumes that at least one coordination equilibrium exists and requires that the agents have a heuristic for determining whether or not the equilibrium was obtained. In deterministic environments with one or more strict coordination equilibria, the algorithm will learn to play an optimal equilibrium as long as the heuristic is accurate. Empirical data demonstrate that the algorithm is also effective in stochastic environments and is able …


Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate Jun 2004

Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate

Theses and Dissertations

Value iteration is not typically considered a viable algorithm for solving large-scale MDPs because it converges too slowly. However, its performance can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques which focus computation in useful regions of the state space. In order to scale to solve ever larger problems, we evaluate all enhancements and methods in the context of parallelizability. Using the enhancements, we discover that in many instances the limiting …


Target Sets: A Tool For Understanding And Predicting The Behavior Of Interacting Q-Learners, Nancy Fulda, Dan A. Ventura Sep 2003

Target Sets: A Tool For Understanding And Predicting The Behavior Of Interacting Q-Learners, Nancy Fulda, Dan A. Ventura

Faculty Publications

Reinforcement learning agents that interact in a common environment frequently affect each others’ perceived transition and reward distributions. This can result in convergence of the agents to a sub-optimal equilibrium or even to a solution that is not an equilibrium at all. Several modifications to the Q-learning algorithm have been proposed which enable agents to converge to optimal equilibria under specified conditions. This paper presents the concept of target sets as an aid to understanding why these modifications have been successful and as a tool to assist in the development of new modifications which are applicable in a wider range …


Dynamic Joint Action Perception For Q-Learning Agents, Nancy Fulda, Dan A. Ventura Jun 2003

Dynamic Joint Action Perception For Q-Learning Agents, Nancy Fulda, Dan A. Ventura

Faculty Publications

Q-learning is a reinforcement learning algorithm that learns expected utilities for state-action transitions through successive interactions with the environment. The algorithm's simplicity as well as its convergence properties have made it a popular algorithm for study. However, its non-parametric representation of utilities limits its effectiveness in environments with large amounts of perceptual input. For example, in multiagent systems, each agent may need to consider the action selections of its counterparts in order to learn effective behaviors. This creates a joint action space which grows exponentially with the number of agents in the system. In such situations, the Q-learning algorithm quickly …


Multiple Stochastic Learning Automata For Vehicle Path Control In An Automated Highway System, Cem Unsal, Pushkin Kachroo, John S. Bay Jan 1999

Multiple Stochastic Learning Automata For Vehicle Path Control In An Automated Highway System, Cem Unsal, Pushkin Kachroo, John S. Bay

Electrical & Computer Engineering Faculty Research

This paper suggests an intelligent controller for an automated vehicle planning its own trajectory based on sensor and communication data. The intelligent controller is designed using the learning stochastic automata theory. Using the data received from on-board sensors, two automata (one for lateral actions, one for longitudinal actions) can learn the best possible action to avoid collisions. The system has the advantage of being able to work in unmodeled stochastic environments, unlike adaptive control methods or expert systems. Simulations for simultaneous lateral and longitudinal control of a vehicle provide encouraging results


Scheduling Straight-Line Code Using Reinforcement Learning And Rollouts, Amy Mcgovern, Eliot Moss, Andrew G. Barto Jan 1999

Scheduling Straight-Line Code Using Reinforcement Learning And Rollouts, Amy Mcgovern, Eliot Moss, Andrew G. Barto

Computer Science Department Faculty Publication Series

The execution order of a block of computer instructions on a pipelined machine can make a difference in its running time by a factor of two or more. In order to achieve the best possible speed, compilers use heuristic schedulers appropriate to each specific architecture implementation. However, these heuristic schedulers are time-consuming and expensive to build. We present empirical results using both rollouts and reinforcement learning to construct heuristics for scheduling basic blocks. In simulation, the rollout scheduler outperformed a commercial scheduler, and the reinforcement learning scheduler performed almost as well as the commercial scheduler.


Linear Least-Squares Algorithms For Temporal Difference Learning, Steven J. Bradtke, Andrew G. Barto Jan 1996

Linear Least-Squares Algorithms For Temporal Difference Learning, Steven J. Bradtke, Andrew G. Barto

Computer Science Department Faculty Publication Series

We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Squares TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton's TD(A) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement …