Q-Learning is an off-policy temporal difference control algorithm:
$$Q\left(S_{t}, A_{t}\right) \leftarrow Q\left(S_{t}, A_{t}\right) + \alpha\left[R_{t+1} + \gamma\max_{a}Q\left(S_{t+1}, a\right) - Q\left(S_{t}, A_{t}\right)\right] $$
The learned action-value function $Q$ directly approximates $q_{*}$, the optimal action-value function, independent of the policy being followed.
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 249 | 39.27% |
Decision Making | 41 | 6.47% |
Multi-agent Reinforcement Learning | 26 | 4.10% |
Management | 25 | 3.94% |
Offline RL | 24 | 3.79% |
Atari Games | 17 | 2.68% |
Autonomous Driving | 13 | 2.05% |
OpenAI Gym | 11 | 1.74% |
D4RL | 10 | 1.58% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |