Methods > Reinforcement Learning

Policy Gradient Methods

Policy Gradient Methods try to optimize the policy function directly in reinforcement learning. This contrasts with, for example Q-Learning, where the policy manifests itself as maximizing a value function. Below you can find a continuously updating catalogue of policy gradient methods.

METHOD YEAR PAPERS
PPO
2017 122
DDPG
2015 94
REINFORCE
1999 94
A2C
2016 41
TRPO
2015 38
A3C
2016 37
TD3
2018 30
Soft Actor Critic
2018 23
MADDPG
2017 12
DPG
2014 9
IMPALA
2018 6
D4PG
2018 5
ACER
2016 4
Soft Actor-Critic (Autotuned Temperature)
2018 4
NoisyNet-A3C
2017 1
ACTKR
2017 1
SVPG
2017 1
Ape-X DPG
2018 1
MDPO
2020 1
TayPO
2020 1