Physical Sciences and Mathematics | Open Access Articles

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm Oct 2023

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm

Theses and Dissertations

Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more …

Go to article

Reinforcement Learning With Auxiliary Memory, Sterling Suggs Jun 2021

Reinforcement Learning With Auxiliary Memory, Sterling Suggs

Theses and Dissertations

Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.

Go to article

Using Logical Specifications For Multi-Objective Reinforcement Learning, Kolby Nottingham Mar 2020

Using Logical Specifications For Multi-Objective Reinforcement Learning, Kolby Nottingham

Undergraduate Honors Theses

In the multi-objective reinforcement learning (MORL) paradigm, the relative importance of environment objectives is often unknown prior to training, so agents must learn to specialize their behavior to optimize different combinations of environment objectives that are specified post-training. These are typically linear combinations, so the agent is effectively parameterized by a weight vector that describes how to balance competing environment objectives. However, we show that behaviors can be successfully specified and learned by much more expressive non-linear logical specifications. We test our agent in several environments with various objectives and show that it can generalize to many never-before-seen specifications.

Go to article

Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton Mar 2008

Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton

Theses and Dissertations

Liquid State Machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without training the SNN. Instead, a reservoir, or liquid, is randomly created which acts as a filter for a readout function. We develop three methods for iteratively refining a randomly generated liquid to create a more effective one. First, we apply Hebbian learning to LSMs by building the liquid with spike-time dependant plasticity (STDP) synapses. Second, we create an eligibility based reinforcement learning algorithm for synaptic development. Third, we apply principles of Hebbian learning and reinforcement learning to create a new algorithm called separation driven synaptic modification …

Go to article

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook Sep 2007

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook

Theses and Dissertations

Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other …

Go to article

Learning Successful Strategies In Repeated General-Sum Games, Jacob W. Crandall Dec 2005

Learning Successful Strategies In Repeated General-Sum Games, Jacob W. Crandall

Theses and Dissertations

Many environments in which an agent can use reinforcement learning techniques to learn profitable strategies are affected by other learning agents. These situations can be modeled as general-sum games. When playing repeated general-sum games with other learning agents, the goal of a self-interested learning agent is to maximize its own payoffs over time. Traditional reinforcement learning algorithms learn myopic strategies in these games. As a result, they learn strategies that produce undesirable results in many games. In this dissertation, we develop and analyze algorithms that learn non-myopic strategies when playing many important infinitely repeated general-sum games. We show that, in …

Go to article

Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein Apr 2005

Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein

Theses and Dissertations

Behavioral animation has become popular for creating virtual characters that are autonomous agents and thus self-animating. This is useful for lessening the workload of human animators, populating virtual environments with interactive agents, etc. Unfortunately, current behavioral animation techniques suffer from three key problems: (1) deliberative behavioral models (i.e., cognitive models) are slow to execute; (2) interactive virtual characters cannot adapt online due to interaction with a human user; (3) programming of behavioral models is a difficult and time-intensive process. This dissertation presents a collection of papers that seek to overcome each of these problems. Specifically, these issues are alleviated …

Go to article

Variable Resolution Discretization In The Joint Space, Christopher K. Monson, Kevin Seppi, David Wingate, Todd S. Peterson Dec 2004

Variable Resolution Discretization In The Joint Space, Christopher K. Monson, Kevin Seppi, David Wingate, Todd S. Peterson

Faculty Publications

We present JoSTLe, an algorithm that performs value iteration on control problems with continuous actions, allowing this useful reinforcement learning technique to be applied to problems where a priori action discretization is inadequate. The algorithm is an extension of a variable resolution technique that works for problems with continuous states and discrete actions. Results are given that indicate that JoSTLe is a promising step toward reinforcement learning in a fully continuous domain.

Go to article

Incremental Policy Learning: An Equilibrium Selection Algorithm For Reinforcement Learning Agents With Common Interests, Nancy Fulda, Dan A. Ventura Jul 2004

Incremental Policy Learning: An Equilibrium Selection Algorithm For Reinforcement Learning Agents With Common Interests, Nancy Fulda, Dan A. Ventura

Faculty Publications

We present an equilibrium selection algorithm for reinforcement learning agents that incrementally adjusts the probability of executing each action based on the desirability of the outcome obtained in the last time step. The algorithm assumes that at least one coordination equilibrium exists and requires that the agents have a heuristic for determining whether or not the equilibrium was obtained. In deterministic environments with one or more strict coordination equilibria, the algorithm will learn to play an optimal equilibrium as long as the heuristic is accurate. Empirical data demonstrate that the algorithm is also effective in stochastic environments and is able …

Go to article

Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate Jun 2004

Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate

Theses and Dissertations

Value iteration is not typically considered a viable algorithm for solving large-scale MDPs because it converges too slowly. However, its performance can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques which focus computation in useful regions of the state space. In order to scale to solve ever larger problems, we evaluate all enhancements and methods in the context of parallelizability. Using the enhancements, we discover that in many instances the limiting …

Go to article

Target Sets: A Tool For Understanding And Predicting The Behavior Of Interacting Q-Learners, Nancy Fulda, Dan A. Ventura Sep 2003

Target Sets: A Tool For Understanding And Predicting The Behavior Of Interacting Q-Learners, Nancy Fulda, Dan A. Ventura

Faculty Publications

Reinforcement learning agents that interact in a common environment frequently affect each others’ perceived transition and reward distributions. This can result in convergence of the agents to a sub-optimal equilibrium or even to a solution that is not an equilibrium at all. Several modifications to the Q-learning algorithm have been proposed which enable agents to converge to optimal equilibria under specified conditions. This paper presents the concept of target sets as an aid to understanding why these modifications have been successful and as a tool to assist in the development of new modifications which are applicable in a wider range …

Go to article

Dynamic Joint Action Perception For Q-Learning Agents, Nancy Fulda, Dan A. Ventura Jun 2003

Dynamic Joint Action Perception For Q-Learning Agents, Nancy Fulda, Dan A. Ventura

Faculty Publications

Q-learning is a reinforcement learning algorithm that learns expected utilities for state-action transitions through successive interactions with the environment. The algorithm's simplicity as well as its convergence properties have made it a popular algorithm for study. However, its non-parametric representation of utilities limits its effectiveness in environments with large amounts of perceptual input. For example, in multiagent systems, each agent may need to consider the action selections of its counterparts in order to learn effective behaviors. This creates a joint action space which grows exponentially with the number of agents in the system. In such situations, the Q-learning algorithm quickly …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm

Theses and Dissertations

Reinforcement Learning With Auxiliary Memory, Sterling Suggs

Theses and Dissertations

Using Logical Specifications For Multi-Objective Reinforcement Learning, Kolby Nottingham

Undergraduate Honors Theses

Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton

Theses and Dissertations

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook

Theses and Dissertations

Learning Successful Strategies In Repeated General-Sum Games, Jacob W. Crandall

Theses and Dissertations

Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein

Theses and Dissertations

Variable Resolution Discretization In The Joint Space, Christopher K. Monson, Kevin Seppi, David Wingate, Todd S. Peterson

Faculty Publications

Incremental Policy Learning: An Equilibrium Selection Algorithm For Reinforcement Learning Agents With Common Interests, Nancy Fulda, Dan A. Ventura

Faculty Publications

Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate

Theses and Dissertations

Target Sets: A Tool For Understanding And Predicting The Behavior Of Interacting Q-Learners, Nancy Fulda, Dan A. Ventura

Faculty Publications

Dynamic Joint Action Perception For Q-Learning Agents, Nancy Fulda, Dan A. Ventura

Faculty Publications