REINFORCE is a Monte Carlo variant of a policy gradient algorithm in reinforcement learning. The agent collects samples of an episode using its current policy, and uses it to update the policy parameter $\theta$. Since one full trajectory must be completed to construct a sample space, it is updated as an off-policy algorithm.
$$ \nabla_{\theta}J\left(\theta\right) = \mathbb{E}_{\pi}\left[G_{t}\nabla_{\theta}\ln\pi_{\theta}\left(A_{t}\mid{S_{t}}\right)\right]$$
Image Credit: Tingwu Wang
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 43 | 24.71% |
Text Generation | 8 | 4.60% |
Image Classification | 7 | 4.02% |
Question Answering | 6 | 3.45% |
Decision Making | 5 | 2.87% |
Recommendation Systems | 5 | 2.87% |
Image Captioning | 5 | 2.87% |
Combinatorial Optimization | 4 | 2.30% |
Object Detection | 3 | 1.72% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |