Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Physical Sciences and Mathematics
The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson
The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson
Theses and Dissertations--Computer Science
We introduce a novel approach for learning behaviors using human-provided feedback that is subject to systematic bias. Our method, known as BASIL, models the feedback signal as a combination of a heuristic evaluation of an action's utility and a probabilistically-drawn bias value, characterized by unknown parameters. We present both the general framework for our technique and specific algorithms for biases drawn from a normal distribution. We evaluate our approach across various environments and tasks, comparing it to interactive and non-interactive machine learning methods, including deep learning techniques, using human trainers and a synthetic oracle with feedback distorted to varying degrees. …
Markov Decision Processes With Embedded Agents, Luke Harold Miles
Markov Decision Processes With Embedded Agents, Luke Harold Miles
Theses and Dissertations--Computer Science
We present Markov Decision Processes with Embedded Agents (MDPEAs), an extension of multi-agent POMDPs that allow for the modeling of environments that can change the actuators, sensors, and learning function of the agent, e.g., a household robot which could gain and lose hardware from its frame, or a sovereign software agent which could encounter viruses on computers that modify its code. We show several toy problems for which standard reinforcement-learning methods fail to converge, and give an algorithm, `just-copy-it`, which learns some of them. Unlike MDPs, MDPEAs are closed systems and hence their evolution over time can be treated as …