TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing (which is similar to a SARSA based update; a safer update, as they provide higher value to actions resistant to perturbations).
Source: Addressing Function Approximation Error in Actor-Critic MethodsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 64 | 23.36% |
Deep Reinforcement Learning | 53 | 19.34% |
Reinforcement Learning | 47 | 17.15% |
Continuous Control | 29 | 10.58% |
OpenAI Gym | 9 | 3.28% |
Decision Making | 8 | 2.92% |
Autonomous Driving | 6 | 2.19% |
Offline RL | 5 | 1.82% |
Meta-Learning | 4 | 1.46% |