no code implementations • 2 Dec 2020 • Markus Holzleitner, Lukas Gruber, José Arjona-Medina, Johannes Brandstetter, Sepp Hochreiter
We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms, which simultaneously learn a policy function, the actor, and a value function, the critic.