AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING

25 Sep 2019  ·  Kai Lagemann, Gregor Roering, Christoph Henke, Rene Vossen, Frank Hees ·

We propose a new mixture of model-based and model-free reinforcement learning (RL) algorithms that combines the strengths of both RL methods. Our goal is to reduce the sample complexity of model-free approaches utilizing fictitious trajectory rollouts performed on a learned dynamics model to improve the data efficiency of policy gradient methods while maintaining the same asymptotic behaviour. We suggest to use a special type of uncertainty quantification by a stochastic dynamics model in which the next state prediction is randomly drawn from the distribution predicted by the dynamics model. As a result, the negative effect of exploiting erroneously optimistic regions in the dynamics model is addressed by next state predictions based on an uncertainty aware ensemble of dynamics models. The influence of the ensemble of dynamics models on the policy update is controlled by adjusting the number of virtually performed rollouts in the next iteration according to the ratio of the real and virtual total reward. Our approach, which we call Model-Based Policy Gradient Enrichment (MBPGE), is tested on a collection of benchmark tests including simulated robotic locomotion. We compare our approach to plain model-free algorithms and a model-based one. Our evaluation shows that MBPGE leads to higher learning rates in an early training stage and an improved asymptotic behaviour.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here