no code implementations • ICLR 2018 • Pierre H. Richemond, Brendan Maginnis
We derive policy gradients where the change in policy is limited to a small Wasserstein distance (or trust region).
no code implementations • ICLR 2018 • Pierre H. Richemond, Brendan Maginnis
Two main families of reinforcement learning algorithms, Q-learning and policy gradients, have recently been proven to be equivalent when using a softmax relaxation on one part, and an entropic regularization on the other.
no code implementations • 22 Dec 2017 • Pierre H. Richemond, Brendan Maginnis
Two main families of reinforcement learning algorithms, Q-learning and policy gradients, have recently been proven to be equivalent when using a softmax relaxation on one part, and an entropic regularization on the other.
no code implementations • 19 Dec 2017 • Pierre H. Richemond, Brendan Maginnis
We derive policy gradients where the change in policy is limited to a small Wasserstein distance (or trust region).
no code implementations • ICLR 2018 • Brendan Maginnis, Pierre H. Richemond
On tasks with a single output the RWA, RDA and GRU units learn much quicker than the LSTM and with better performance.