no code implementations • 9 Feb 2022 • Damian Boborzi, Christoph-Nikolas Straehle, Jens S. Buchner, Lars Mikelsons
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
no code implementations • 29 Sep 2021 • Damian Boborzi, Christoph-Nikolas Straehle, Jens Stefan Buchner, Lars Mikelsons
Our training objective minimizes the Kulback-Leibler divergence between the policy and expert state transition trajectories which can be optimized in a non-adversarial fashion.