This document discusses zero-shot reinforcement learning. It defines zero-shot RL as using a compact representation of a reward-free environment to immediately produce a good policy when a reward function is specified, without further learning. It describes using successor features and forward-backward representations for this purpose. Algorithms are proposed to learn the successor features and forward-backward representations by minimizing Bellman residuals. Various methods for learning the basic feature representation used in successor features are also discussed.
Related topics: