Model-based Reinforcement Learning with Ensembled Model-value Expansion

29 Sep 2021 · Gaurav Manek, J Zico Kolter ·

Model-based reinforcement learning (MBRL) methods are often more data-efficient and quicker to converge than their model-free counterparts, but typically rely crucially on accurate modeling of the environment dynamics and associated uncertainty in order to perform well. Recent approaches have used ensembles of dynamics models within MBRL to separately capture aleatoric and epistemic uncertainty of the learned dynamics, but many MBRL algorithms are still limited because they treat these dynamics models as a "black box" without fully exploiting the uncertainty modeling. In this paper, we propose a simple but effective approach to improving the performance of MBRL by directly incorporating the ensemble prediction \emph{into} the RL method itself: we propose constructing multiple value roll-outs using different members of the dynamics ensemble, and aggregating the separate estimates to form a joint estimate of the state value. Despite its simplicity, we show that this method substantially improves the performance of MBRL methods: we comprehensively evaluate this technique on common locomotion benchmarks, with ablative experiments to show the added value of our proposed components.

PDF Abstract