Policy Gradient Methods


REINFORCE is a Monte Carlo variant of a policy gradient algorithm in reinforcement learning. The agent collects samples of an episode using its current policy, and uses it to update the policy parameter $\theta$. Since one full trajectory must be completed to construct a sample space, it is updated as an off-policy algorithm.

$$ \nabla_{\theta}J\left(\theta\right) = \mathbb{E}_{\pi}\left[G_{t}\nabla_{\theta}\ln\pi_{\theta}\left(A_{t}\mid{S_{t}}\right)\right]$$

Image Credit: Tingwu Wang


Paper Code Results Date Stars


Task Papers Share
Reinforcement Learning (RL) 52 23.64%
Sentence 8 3.64%
Text Generation 8 3.64%
Image Classification 7 3.18%
Question Answering 6 2.73%
Decision Making 5 2.27%
Recommendation Systems 5 2.27%
Image Captioning 5 2.27%
Retrieval 4 1.82%


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign