no code implementations • 31 May 2021 • Paavo Parmas, Masashi Sugiyama
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature.
no code implementations • 14 Oct 2019 • Paavo Parmas, Masashi Sugiyama
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.
1 code implementation • 1 Jun 2019 • Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning.
no code implementations • NeurIPS 2018 • Paavo Parmas
Backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention.
3 code implementations • ICML 2018 • Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya
Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization.
Model-based Reinforcement Learning reinforcement-learning +1