Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook Sep 2007

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook

Theses and Dissertations

Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other …


Reinforcement Learning Neural-Network-Based Controller For Nonlinear Discrete-Time Systems With Input Constraints, Pingan He, Jagannathan Sarangapani Jan 2007

Reinforcement Learning Neural-Network-Based Controller For Nonlinear Discrete-Time Systems With Input Constraints, Pingan He, Jagannathan Sarangapani

Electrical and Computer Engineering Faculty Research & Creative Works

A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing …