Physical Sciences and Mathematics | Open Access Articles

Proto-Transfer Learning In Markov Decision Processes Using Spectral Methods, Kimberly Ferguson, Sridhar Mahadevan

Sridhar Mahadevan

In this paper we introduce proto-transfer leaning, a new framework for transfer learning. We explore solutions to transfer learning within reinforcement learning through the use of spectral methods. Proto-value functions (PVFs) are basis functions computed from a spectral analysis of random walks on the state space graph. They naturally lead to the ability to transfer knowledge and representation between related tasks or domains. We investigate task transfer by using the same PVFs in Markov decision processes (MDPs) with different rewards functions. Additionally, our experiments in domain transfer explore applying the Nyström method for interpolation of PVFs between MDPs of different …

Go to article

Scheduling Straight-Line Code Using Reinforcement Learning And Rollouts, Amy Mcgovern, Eliot Moss, Andrew G. Barto

Andrew G. Barto

The execution order of a block of computer instructions on a pipelined machine can make a difference in its running time by a factor of two or more. In order to achieve the best possible speed, compilers use heuristic schedulers appropriate to each specific architecture implementation. However, these heuristic schedulers are time-consuming and expensive to build. We present empirical results using both rollouts and reinforcement learning to construct heuristics for scheduling basic blocks. In simulation, the rollout scheduler outperformed a commercial scheduler, and the reinforcement learning scheduler performed almost as well as the commercial scheduler.

Go to article

Global Optimization For Value Function Approximation, Marek Petrik, Shlomo Zilberstein

Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze both optimal and approximate algorithms for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Proto-Transfer Learning In Markov Decision Processes Using Spectral Methods, Kimberly Ferguson, Sridhar Mahadevan

Sridhar Mahadevan

Scheduling Straight-Line Code Using Reinforcement Learning And Rollouts, Amy Mcgovern, Eliot Moss, Andrew G. Barto

Andrew G. Barto

Global Optimization For Value Function Approximation, Marek Petrik, Shlomo Zilberstein

Shlomo Zilberstein