ReaPER: Improving Sample Efficiency in Model-Based Latent Imagination

1 Jan 2021  ·  Martin A Bertran, Guillermo Sapiro, Mariano Phielipp ·

Deep Reinforcement Learning (DRL) can distill behavioural policies from sensory input that solve complex tasks, however, the policies tend to be task-specific and sample inefficient, requiring a large number of interactions with the environment that may be costly or impractical for many real world applications. Model-based DRL (MBRL) can allow learned behaviours and dynamics from one task to be translated to a new task in a related environment, but still suffer from low sample efficiency. In this work we introduce ReaPER, an algorithm that addresses the sample efficiency challenge in model-based DRL, we illustrate the power of the proposed solution on the DeepMind Control benchmark. Our improvements are driven by sparse , self-supervised, contrastive model representations and efficient use of past experience. We empirically analyze each novel component of ReaPER and analyze how they contribute to sample efficiency. We also illustrate how other standard alternatives fail to improve upon previous methods. Code will be made available.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here