Learning State Representations via Temporal Cycle-Consistency Constraint in Model-Based Reinforcement Learning

Representation learning is a popular approach for reinforcement learning (RL) tasks with partially observable Markov decision processes. Existing works on learning representations utilise the dynamics models in model-based RL to perform training through model predictive reconstruction in a temporally forward fashion. However, temporally backward state predictions also yield useful supervision signals as they convey information about the future states given the action choices. We argue that combining them with forward passes will facilitate stronger representation learning and improve the sample efficiency of RL. Here we propose a general framework for learning state representations for RL tasks, utilising both forward and backward passes by imposing temporal cycle-consistency constraints, which can be integrated with any model-based RL algorithms leveraging a latent dynamics model. We show improved empirical performance in terms of sample-efficiency and convergence score over several baselines on continuous control benchmarks.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here