Learning powerful policies and better dynamics models by encouraging consistency

27 Sep 2018 · Shagun Sodhani, Anirudh Goyal, Tristan Deleu, Yoshua Bengio, Jian Tang ·

Model-based reinforcement learning approaches have the promise of being sample efficient. Much of the progress in learning dynamics models in RL has been made by learning models via supervised learning. There is enough evidence that humans build a model of the environment, not only by observing the environment but also by interacting with the environment. Interaction with the environment allows humans to carry out "experiments": taking actions that help uncover true causal relationships which can be used for building better dynamics models. Analogously, we would expect such interaction to be helpful for a learning agent while learning to model the environment dynamics. In this paper, we build upon this intuition, by using an auxiliary cost function to ensure consistency between what the agent observes (by acting in the real world) and what it imagines (by acting in the ``learned'' world). Our empirical analysis shows that the proposed approach helps to train powerful policies as well as better dynamics models.

PDF Abstract