Learning Latent Landmarks for Generalizable Planning
Planning - the ability to analyze the structure of a problem in the large and decompose it into interrelated subproblems - is a hallmark of human intelligence. While deep reinforcement learning (RL) has shown great promise for solving relatively straightforward control tasks, it remains an open problem how to best incorporate planning into existing deep RL paradigms to handle increasingly complex environments. This problem seems difficult because planning requires an agent to reason over temporally extended periods. In principle, however, planning can take advantage of the higher-level structure of a complex problem - freeing the lower-level policy to focus on learning simple behaviors. In this work, we leverage the graph structure inherent to MDPs to address the problem of planning in RL. Rather than learning a graph over a collection of visited states, we learn latent landmarks that are scattered - in terms of reachability - across the goal space to provide state and temporal abstraction. On a variety of high-dimensional continuous control tasks, we demonstrate that our method outperforms prior work, and is oftentimes the only method capable of leveraging both the robustness of model-free RL and generalization of graph-search algorithms. We believe our work is an important step towards scalable planning in the reinforcement learning setting.
PDF Abstract