Search Results for author: Paavo Parmas

Found 5 papers, 2 papers with code

A unified view of likelihood ratio and reparameterization gradients

no code implementations • 31 May 2021 • Paavo Parmas, Masashi Sugiyama

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature.

Paper
Add Code

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

no code implementations • 14 Oct 2019 • Paavo Parmas, Masashi Sugiyama

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Neural Replicator Dynamics

1 code implementation • 1 Jun 2019 • Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning.

counterfactual Policy Gradient Methods

Paper
Code

Total stochastic gradient algorithms and applications in reinforcement learning

no code implementations • NeurIPS 2018 • Paavo Parmas

Backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention.

Density Estimation reinforcement-learning +1

Paper
Add Code

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

3 code implementations • ICML 2018 • Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.