Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Reinforcement learning

Institution
Publication Year
Publication
Publication Type

Articles 151 - 180 of 191

Full-Text Articles in Physical Sciences and Mathematics

Creating Autonomous Adaptive Agents In A Real-Time First-Person Shooter Computer Game, Di Wang, Ah-Hwee Tan Jul 2014

Creating Autonomous Adaptive Agents In A Real-Time First-Person Shooter Computer Game, Di Wang, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Games are good test-beds to evaluate AI methodologies. In recent years, there has been a vast amount of research dealing with real-time computer games other than the traditional board games or card games. This paper illustrates how we create agents by employing FALCON, a self-organizing neural network that performs reinforcement learning, to play a well-known first-person shooter computer game called Unreal Tournament. Rewards used for learning are either obtained from the game environment or estimated using the temporal difference learning scheme. In this way, the agents are able to acquire proper strategies and discover the effectiveness of different weapons without …


Memory-Guided Exploration In Reinforcement Learning, James L. Carroll, Todd Peterson Jan 2014

Memory-Guided Exploration In Reinforcement Learning, James L. Carroll, Todd Peterson

Journal of Undergraduate Research

Traditional reinforcement learning techniques learn a single task by giving the agent positive and negative rewards. In one type of reinforcement learning, called Q-learning, the agent stores Qvalues, which are the expected reward for performing an action in a given state. Task transfer is a method of transferring information learned in one task to another related task. Most work in transfer has focused on classification techniques. The purpose of our research has been to extend classification techniques to reinforcement learning.


Reinforcement Learning Task Clustering, James Carroll, Todd Peterson Jan 2014

Reinforcement Learning Task Clustering, James Carroll, Todd Peterson

Journal of Undergraduate Research

Reinforcement Learning is a process whereby actions are acquired using reinforcement signals. A signal is given to an autonomous agent indicating how well that agent is performing an action. The agent then attempts to maximize this reinforcement signal. One common method in reinforcement learning is Q-learning where the agent attempts to learn the expected temporally discounted value function for performing an action a in a state s Q(s,a). This function is updated according to:


Complementary Layered Learning, Sean Mondesire Jan 2014

Complementary Layered Learning, Sean Mondesire

Electronic Theses and Dissertations

Layered learning is a machine learning paradigm used to develop autonomous robotic-based agents by decomposing a complex task into simpler subtasks and learns each sequentially. Although the paradigm continues to have success in multiple domains, performance can be unexpectedly unsatisfactory. Using Boolean-logic problems and autonomous agent navigation, we show poor performance is due to the learner forgetting how to perform earlier learned subtasks too quickly (favoring plasticity) or having difficulty learning new things (favoring stability). We demonstrate that this imbalance can hinder learning so that task performance is no better than that of a suboptimal learning technique, monolithic learning, which …


Adaptive Computer‐Generated Forces For Simulator‐Based Training, Expert Systems With Applications, Teck-Hou Teng, Ah-Hwee Tan, Loo-Nin Teow Dec 2013

Adaptive Computer‐Generated Forces For Simulator‐Based Training, Expert Systems With Applications, Teck-Hou Teng, Ah-Hwee Tan, Loo-Nin Teow

Research Collection School Of Computing and Information Systems

Simulator-based training is in constant pursuit of increasing level of realism. The transition from doctrine-driven computer-generated forces (CGF) to adaptive CGF represents one such effort. The use of doctrine-driven CGF is fraught with challenges such as modeling of complex expert knowledge and adapting to the trainees’ progress in real time. Therefore, this paper reports on how the use of adaptive CGF can overcome these challenges. Using a self-organizing neural network to implement the adaptive CGF, air combat maneuvering strategies are learned incrementally and generalized in real time. The state space and action space are extracted from the same hierarchical doctrine …


Reinforcement Learning With Motivations For Realistic Agents, Jacquelyne T. Forgette Sep 2013

Reinforcement Learning With Motivations For Realistic Agents, Jacquelyne T. Forgette

Electronic Thesis and Dissertation Repository

Believable virtual humans have important applications in various fields, including computer based video games. The challenge in programming video games is to produce a non-player controlled character that is autonomous, and capable of action selections that appear human. In this thesis, motivations are used as a basis for learning using reinforcements. With motives driving the decisions of the agents, their actions will appear less structured and repetitious, and more human in nature. This will also allow developers to easily create game agents with specific motivations, based mostly on their narrative purposes. With minimum and maximum desirable motive values, the agents …


Actor-Critic-Based Ink Drop Spread As An Intelligent Controller, Hesam Sagha, Iman Esmaili Paeen Afrakoti, Saeed Bagherishouraki Jan 2013

Actor-Critic-Based Ink Drop Spread As An Intelligent Controller, Hesam Sagha, Iman Esmaili Paeen Afrakoti, Saeed Bagherishouraki

Turkish Journal of Electrical Engineering and Computer Sciences

This paper introduces an innovative adaptive controller based on the actor-critic method. The proposed approach employs the ink drop spread (IDS) method as its main engine. The IDS method is a new trend in soft-computing approaches that is a universal fuzzy modeling technique and has been also used as a supervised controller. Its process is very similar to the processing system of the human brain. The proposed actor-critic method uses an IDS structure as an actor and a 2-dimensional plane, representing control variable states, as a critic that estimates the lifetime goodness of each state. This method is fast, simple, …


Self-Regulating Action Exploration In Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan Oct 2012

Self-Regulating Action Exploration In Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration …


Efficient Reinforcement Learning In Multiple-Agent Systems And Its Application In Cognitive Radio Networks, Jing Zhang Apr 2012

Efficient Reinforcement Learning In Multiple-Agent Systems And Its Application In Cognitive Radio Networks, Jing Zhang

Dissertations

The objective of reinforcement learning in multiple-agent systems is to find an efficient learning method for the agents to behave optimally. Finding Nash equilibrium has become the common learning target for the optimality. However, finding Nash equilibrium is a PPAD (Polynomial Parity Arguments on Directed graphs)-complete problem. The conventional methods can find Nash equilibrium for some special types of Markov games.

This dissertation proposes a new reinforcement learning algorithm to improve the search efficiency and effectiveness for multiple-agent systems. This algorithm is based on the definition of Nash equilibrium and utilizes the greedy and rational features of the agents. When …


Motivated Learning For The Development Of Autonomous Agents, Janusz A. Starzyk, James T. Graham, Pawel Raif, Ah-Hwee Tan Apr 2012

Motivated Learning For The Development Of Autonomous Agents, Janusz A. Starzyk, James T. Graham, Pawel Raif, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

A new machine learning approach known as motivated learning (ML) is presented in this work. Motivated learning drives a machine to develop abstract motivations and choose its own goals. ML also provides a self-organizing system that controls a machine’s behavior based on competition between dynamically-changing pain signals. This provides an interplay of externally driven and internally generated control signals. It is demonstrated that ML not only yields a more sophisticated learning mechanism and system of values than reinforcement learning (RL), but is also more efficient in learning complex relations and delivers better performance than RL in dynamically changing environments. In …


Self‐Regulating Action Exploration In Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan, Yuan-Sin Tan Jan 2012

Self‐Regulating Action Exploration In Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan, Yuan-Sin Tan

Research Collection School Of Computing and Information Systems

The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration …


Cooperative Reinforcement Learning In Topology-Based Multi-Agent Systems, Dan Xiao, Ah-Hwee Tan Oct 2011

Cooperative Reinforcement Learning In Topology-Based Multi-Agent Systems, Dan Xiao, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual …


A Hybrid Agent Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yew-Soon Ong, Akejariyawong Tapanuj Jul 2011

A Hybrid Agent Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yew-Soon Ong, Akejariyawong Tapanuj

Research Collection School Of Computing and Information Systems

This paper presents a hybrid agent architecture that integrates the behaviours of BDI agents, specifically desire and intention, with a neural network based reinforcement learner known as Temporal DifferenceFusion Architecture for Learning and COgNition (TD-FALCON). With the explicit maintenance of goals, the agent performs reinforcement learning with the awareness of its objectives instead of relying on external reinforcement signals. More importantly, the intention module equips the hybrid architecture with deliberative planning capabilities, enabling the agent to purposefully maintain an agenda of actions to perform and reducing the need of constantly sensing the environment. Through reinforcement learning, plans can also be …


Higher-Level Application Of Adaptive Dynamic Programming/Reinforcement Learning – A Next Phase For Controls And System Identification?, George G. Lendaris Apr 2011

Higher-Level Application Of Adaptive Dynamic Programming/Reinforcement Learning – A Next Phase For Controls And System Identification?, George G. Lendaris

Systems Science Friday Noon Seminar Series

Humans have the ability to make use of experience while performing system identification and selecting control actions for changing situations. In contrast to current technological implementations that slow down as more knowledge is stored, as more experience is gained, human processing speeds up and has enhanced effectiveness. An emerging experience-based (“higher level”) approach promises to endow our technology with enhanced efficiency and effectiveness.

The notions of context and context discernment are important to understanding this human ability. These are defined as appropriate to controls and system-identification. Some general background on controls, Dynamic Programming, and Adaptive Critic leading to Adaptive Dynamic …


Reinforcement Learning Of Competitive And Cooperative Skills In Soccer Agents, Jinsong Leng, Chee Lim Jan 2011

Reinforcement Learning Of Competitive And Cooperative Skills In Soccer Agents, Jinsong Leng, Chee Lim

Research outputs 2011

The main aim of this paper is to provide a comprehensive numerical analysis on the efficiency of various reinforcementlearning (RL) techniques in an agent-based soccer game. The SoccerBots is employed as a simulation testbed to analyze the effectiveness of RL techniques under various scenarios. A hybrid agent teaming framework for investigating agent team architecture, learning abilities, and other specific behaviours is presented. Novel RL algorithms to verify the competitiveandcooperativelearning abilities of goal-oriented agents for decision-making are developed. In particular, the tile coding (TC) technique, a function approximation approach, is used to prevent the state space from growing exponentially, hence avoiding …


An Exploration Of Multi-Agent Learning Within The Game Of Sheephead, Brady Brau Jan 2011

An Exploration Of Multi-Agent Learning Within The Game Of Sheephead, Brady Brau

All Graduate Theses, Dissertations, and Other Capstone Projects

In this paper, we examine a machine learning technique presented by Ishii et al. used to allow for learning in a multi-agent environment and apply an adaptation of this learning technique to the card game Sheephead. We then evaluate the effectiveness of our adaptation by running simulations against rule-based opponents. Multi-agent learning presents several layers of complexity on top of a single-agent learning in a stationary environment. This added complexity and increased state space is just beginning to be addressed by researchers. We utilize techniques used by Ishii et al. to facilitate this multi-agent learning. We model the environment of …


Proto-Transfer Learning In Markov Decision Processes Using Spectral Methods, Kimberly Ferguson, Sridhar Mahadevan Dec 2010

Proto-Transfer Learning In Markov Decision Processes Using Spectral Methods, Kimberly Ferguson, Sridhar Mahadevan

Sridhar Mahadevan

In this paper we introduce proto-transfer leaning, a new framework for transfer learning. We explore solutions to transfer learning within reinforcement learning through the use of spectral methods. Proto-value functions (PVFs) are basis functions computed from a spectral analysis of random walks on the state space graph. They naturally lead to the ability to transfer knowledge and representation between related tasks or domains. We investigate task transfer by using the same PVFs in Markov decision processes (MDPs) with different rewards functions. Additionally, our experiments in domain transfer explore applying the Nyström method for interpolation of PVFs between MDPs of different …


Scheduling Straight-Line Code Using Reinforcement Learning And Rollouts, Amy Mcgovern, Eliot Moss, Andrew G. Barto Dec 2010

Scheduling Straight-Line Code Using Reinforcement Learning And Rollouts, Amy Mcgovern, Eliot Moss, Andrew G. Barto

Andrew G. Barto

The execution order of a block of computer instructions on a pipelined machine can make a difference in its running time by a factor of two or more. In order to achieve the best possible speed, compilers use heuristic schedulers appropriate to each specific architecture implementation. However, these heuristic schedulers are time-consuming and expensive to build. We present empirical results using both rollouts and reinforcement learning to construct heuristics for scheduling basic blocks. In simulation, the rollout scheduler outperformed a commercial scheduler, and the reinforcement learning scheduler performed almost as well as the commercial scheduler.


A Biologically-Inspired Cognitive Agent Model Integrating Declarative Knowledge And Reinforcement Learning, Ah-Hwee Tan, Gee-Wah Ng Sep 2010

A Biologically-Inspired Cognitive Agent Model Integrating Declarative Knowledge And Reinforcement Learning, Ah-Hwee Tan, Gee-Wah Ng

Research Collection School Of Computing and Information Systems

The paper proposes a biologically-inspired cognitive agent model, known as FALCON-X, based on an integration of the Adaptive Control of Thought (ACT-R) architecture and a class of self-organizing neural networks called fusion Adaptive Resonance Theory (fusion ART). By replacing the production system of ACT-R by a fusion ART model, FALCON-X integrates high-level deliberative cognitive behaviors and real-time learning abilities, based on biologically plausible neural pathways. We illustrate how FALCON-X, consisting of a core inference area interacting with the associated intentional, declarative, perceptual, motor and critic memory modules, can be used to build virtual robots for battles in a simulated RoboCode …


Global Optimization For Value Function Approximation, Marek Petrik, Shlomo Zilberstein Jun 2010

Global Optimization For Value Function Approximation, Marek Petrik, Shlomo Zilberstein

Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze both optimal and approximate algorithms for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization …


A Self-Organizing Neural Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yu-Hong Feng, Yew-Soon Ong Mar 2010

A Self-Organizing Neural Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yu-Hong Feng, Yew-Soon Ong

Research Collection School Of Computing and Information Systems

This paper presents a self-organizing neural architecture that integrates the features of belief, desire, and intention (BDI) systems with reinforcement learning. Based on fusion Adaptive Resonance Theory (fusion ART), the proposed architecture provides a unified treatment for both intentional and reactive cognitive functionalities. Operating with a sense-act-learn paradigm, the low level reactive module is a fusion ART network that learns action and value policies across the sensory, motor, and feedback channels. During performance, the actions executed by the reactive module are tracked by a high level intention module (also a fusion ART network) that learns to associate sequences of actions …


Motivated Learning As An Extension Of Reinforcement Learning, Janusz Starzyk, Pawel Raif, Ah-Hwee Tan Jan 2010

Motivated Learning As An Extension Of Reinforcement Learning, Janusz Starzyk, Pawel Raif, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

We have developed a unified framework to conduct computational experiments with both learning systems: Motivated learning based on Goal Creation System, and reinforcedment learning using RL Q-Learning Algorithm. Future work includes combining motivated learning to set abstract motivations and manage goals with reinforcement learning to learn proper actions. This will allow testing of motivated learning on typical reinforcement learning benchmarks with large dimensionality of the state/action spaces.


Dynamic Coalition Formation Under Uncertainty, Daylon J. Hooper, Gilbert L. Peterson, Brett J. Borghetti Oct 2009

Dynamic Coalition Formation Under Uncertainty, Daylon J. Hooper, Gilbert L. Peterson, Brett J. Borghetti

Faculty Publications

Coalition formation algorithms are generally not applicable to real-world robotic collectives since they lack mechanisms to handle uncertainty. Those mechanisms that do address uncertainty either deflect it by soliciting information from others or apply reinforcement learning to select an agent type from within a set. This paper presents a coalition formation mechanism that directly addresses uncertainty while allowing the agent types to fall outside of a known set. The agent types are captured through a novel agent modeling technique that handles uncertainty through a belief-based evaluation mechanism. This technique allows for uncertainty in environmental data, agent type, coalition value, and …


A Survey Of Transfer Learning Methods For Reinforcement Learning, Nicholas Bone Dec 2008

A Survey Of Transfer Learning Methods For Reinforcement Learning, Nicholas Bone

Computer Science Graduate and Undergraduate Student Scholarship

Transfer Learning (TL) is the branch of Machine Learning concerned with improving performance on a target task by leveraging knowledge from a related (and usually already learned) source task. TL is potentially applicable to any learning task, but in this survey we consider TL in a Reinforcement Learning (RL) context. TL is inspired by psychology; humans constantly apply previous knowledge to new tasks, but such transfer has traditionally been very difficult for—or ignored by—machine learning applications. The goals of TL are to facilitate faster and better learning of new tasks by applying past experience where appropriate, and to enable autonomous …


Self-Organizing Neural Models Integrating Rules And Reinforcement Learning, Teck-Hou Teng, Zhong-Ming Tan, Ah-Hwee Tan Jun 2008

Self-Organizing Neural Models Integrating Rules And Reinforcement Learning, Teck-Hou Teng, Zhong-Ming Tan, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Traditional approaches to integrating knowledge into neural network are concerned mainly about supervised learning. This paper presents how a family of self-organizing neural models known as fusion architecture for learning, cognition and navigation (FALCON) can incorporate a priori knowledge and perform knowledge refinement and expansion through reinforcement learning. Symbolic rules are formulated based on pre-existing know-how and inserted into FALCON as a priori knowledge. The availability of knowledge enables FALCON to start performing earlier in the initial learning trials. Through a temporal-difference (TD) learning method, the inserted rules can be refined and expanded according to the evaluative feedback signals received …


Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton Mar 2008

Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton

Theses and Dissertations

Liquid State Machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without training the SNN. Instead, a reservoir, or liquid, is randomly created which acts as a filter for a readout function. We develop three methods for iteratively refining a randomly generated liquid to create a more effective one. First, we apply Hebbian learning to LSMs by building the liquid with spike-time dependant plasticity (STDP) synapses. Second, we create an eligibility based reinforcement learning algorithm for synaptic development. Third, we apply principles of Hebbian learning and reinforcement learning to create a new algorithm called separation driven synaptic modification …


Integrating Temporal Difference Methods And Self‐Organizing Neural Networks For Reinforcement Learning With Delayed Evaluative Feedback, Ah-Hwee Tan, Ning Lu, Dan Xiao Feb 2008

Integrating Temporal Difference Methods And Self‐Organizing Neural Networks For Reinforcement Learning With Delayed Evaluative Feedback, Ah-Hwee Tan, Ning Lu, Dan Xiao

Research Collection School Of Computing and Information Systems

This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state-action space estimated through on-policy and off-policy TD learning methods, specifically state-action-reward-state-action (SARSA) and Q-learning. The learned value functions are then used to determine the …


Implementation Of Reinforcement Learning In Game Strategy Design, Chien-Yu Lin Jan 2008

Implementation Of Reinforcement Learning In Game Strategy Design, Chien-Yu Lin

Theses Digitization Project

The purpose of this study is to apply reinforcement learning to the design of game strategy. In the gaming industry, the strategy used by computers to win a game is usually pre-programmed by game designers according to the game patterns or a set of rules.


Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook Sep 2007

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook

Theses and Dissertations

Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other …


Reinforcement Learning Neural-Network-Based Controller For Nonlinear Discrete-Time Systems With Input Constraints, Pingan He, Jagannathan Sarangapani Jan 2007

Reinforcement Learning Neural-Network-Based Controller For Nonlinear Discrete-Time Systems With Input Constraints, Pingan He, Jagannathan Sarangapani

Electrical and Computer Engineering Faculty Research & Creative Works

A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing …