Attention mechanisms are generic inductive biases that have played a critical role in improving the state-of-the-art in supervised learning, unsupervised pre-training and generative modeling for multiple domains including vision, language and speech.
Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance.
The later, while they present low sample complexity, they learn latent spaces that need to reconstruct every single detail of the scene.
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning.
In this work, we propose an asynchronous framework for model-based reinforcement learning methods that brings down the run time of these algorithms to be just the data collection time.
Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.
Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards.
Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.
Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations or unseen situations cause proficient but specialized policies to fail at test time.
In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.