Meta Reinforcement Learning for Fast Adaptation of Hierarchical Policies
Hierarchical methods have the potential to allow reinforcement learning to scale to larger environments. Decomposing a task into transferable components, however, remains a challenging problem. In this paper, we propose a meta-learning approach for learning such a decomposition within the options framework. We formulate the objective as a bi-level optimization problem in which sub-policies and their terminations should facilitate fast learning on a family of tasks. Once such a set of options is obtained, it can then be used in new tasks where only the sequencing of options needs to be chosen. Our formalism tends to result in options where fewer decisions are needed to solve such new tasks. Experimentally, we show that our method is able to learn transferable components which accelerate learning and performs better than existing methods developed for this setting in the challenging ant maze locomotion task.
PDF Abstract