Search Results for author: Paavo Parmas

Found 5 papers, 2 papers with code

A unified view of likelihood ratio and reparameterization gradients

no code implementations31 May 2021 Paavo Parmas, Masashi Sugiyama

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature.

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

no code implementations14 Oct 2019 Paavo Parmas, Masashi Sugiyama

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.

reinforcement-learning Reinforcement Learning (RL)

Total stochastic gradient algorithms and applications in reinforcement learning

no code implementations NeurIPS 2018 Paavo Parmas

Backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention.

Density Estimation reinforcement-learning +1

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

3 code implementations ICML 2018 Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization.

Model-based Reinforcement Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.