Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

Brigham Young University

Theses/Dissertations

Reinforcement learning

Articles 1 - 8 of 8

Full-Text Articles in Entire DC Network

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm Oct 2023

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm

Theses and Dissertations

Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more …


Reinforcement Learning With Auxiliary Memory, Sterling Suggs Jun 2021

Reinforcement Learning With Auxiliary Memory, Sterling Suggs

Theses and Dissertations

Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.


Using Logical Specifications For Multi-Objective Reinforcement Learning, Kolby Nottingham Mar 2020

Using Logical Specifications For Multi-Objective Reinforcement Learning, Kolby Nottingham

Undergraduate Honors Theses

In the multi-objective reinforcement learning (MORL) paradigm, the relative importance of environment objectives is often unknown prior to training, so agents must learn to specialize their behavior to optimize different combinations of environment objectives that are specified post-training. These are typically linear combinations, so the agent is effectively parameterized by a weight vector that describes how to balance competing environment objectives. However, we show that behaviors can be successfully specified and learned by much more expressive non-linear logical specifications. We test our agent in several environments with various objectives and show that it can generalize to many never-before-seen specifications.


Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton Mar 2008

Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton

Theses and Dissertations

Liquid State Machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without training the SNN. Instead, a reservoir, or liquid, is randomly created which acts as a filter for a readout function. We develop three methods for iteratively refining a randomly generated liquid to create a more effective one. First, we apply Hebbian learning to LSMs by building the liquid with spike-time dependant plasticity (STDP) synapses. Second, we create an eligibility based reinforcement learning algorithm for synaptic development. Third, we apply principles of Hebbian learning and reinforcement learning to create a new algorithm called separation driven synaptic modification …


Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook Sep 2007

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook

Theses and Dissertations

Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other …


Learning Successful Strategies In Repeated General-Sum Games, Jacob W. Crandall Dec 2005

Learning Successful Strategies In Repeated General-Sum Games, Jacob W. Crandall

Theses and Dissertations

Many environments in which an agent can use reinforcement learning techniques to learn profitable strategies are affected by other learning agents. These situations can be modeled as general-sum games. When playing repeated general-sum games with other learning agents, the goal of a self-interested learning agent is to maximize its own payoffs over time. Traditional reinforcement learning algorithms learn myopic strategies in these games. As a result, they learn strategies that produce undesirable results in many games. In this dissertation, we develop and analyze algorithms that learn non-myopic strategies when playing many important infinitely repeated general-sum games. We show that, in …


Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein Apr 2005

Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein

Theses and Dissertations

Behavioral animation has become popular for creating virtual characters that are autonomous agents and thus self-animating. This is useful for lessening the workload of human animators, populating virtual environments with interactive agents, etc. Unfortunately, current behavioral animation techniques suffer from three key problems: (1) deliberative behavioral models (i.e., cognitive models) are slow to execute; (2) interactive virtual characters cannot adapt online due to interaction with a human user; (3) programming of behavioral models is a difficult and time-intensive process. This dissertation presents a collection of papers that seek to overcome each of these problems. Specifically, these issues are alleviated …


Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate Jun 2004

Solving Large Mdps Quickly With Partitioned Value Iteration, David Wingate

Theses and Dissertations

Value iteration is not typically considered a viable algorithm for solving large-scale MDPs because it converges too slowly. However, its performance can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques which focus computation in useful regions of the state space. In order to scale to solve ever larger problems, we evaluate all enhancements and methods in the context of parallelizability. Using the enhancements, we discover that in many instances the limiting …