Unsupervised Active Pre-Training for Reinforcement Learning

1 Jan 2021  ·  Hao liu, Pieter Abbeel ·

We introduce a new unsupervised pre-training method for reinforcement learning called $\textbf{APT}$, which stands for $\textbf{A}\text{ctive}\textbf{P}\text{re-}\textbf{T}\text{raining}$. APT learns a representation and a policy initialization by actively searching for novel states in reward-free environments. We use the contrastive learning framework for learning the representation from collected transitions. The key novel idea is to collect data during pre-training by maximizing a particle based entropy computed in the learned latent representation space. By doing particle based entropy maximization, we alleviate the need for challenging density modeling and are thus able to scale our approach to image observations. APT successfully learns meaningful representations as well as policy initializations without using any reward. APT is conceptually simple to implement, scalable, and empirically powerful. We empirically evaluate APT on the Atari game suite and DMControl suite by exposing task-specific reward to agent after a long unsupervised pre-training phase. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch. On Atari games, APT achieves human-level performance on $12$ games and obtains highly competitive performance compared to canonical RL algorithms.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods