no code implementations • 25 Sep 2019 • Kai Lagemann, Gregor Roering, Christoph Henke, Rene Vossen, Frank Hees
The influence of the ensemble of dynamics models on the policy update is controlled by adjusting the number of virtually performed rollouts in the next iteration according to the ratio of the real and virtual total reward.
no code implementations • 15 Sep 2018 • Philipp Ennen, Pia Bresenitz, Rene Vossen, Frank Hees
However, due to the small number of real-world trajectory samples in Guided Policy Search, the resulting neural networks are only robust in the neighbourhood of the trajectory distribution explored by real-world interactions.