Policy Gradient Methods

Twin Delayed Deep Deterministic

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing (which is similar to a SARSA based update; a safer update, as they provide higher value to actions resistant to perturbations).

Source: Addressing Function Approximation Error in Actor-Critic Methods

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Reinforcement Learning (RL) 58 40.56%
Continuous Control 26 18.18%
OpenAI Gym 8 5.59%
Decision Making 7 4.90%
Autonomous Driving 5 3.50%
Offline RL 3 2.10%
Meta-Learning 3 2.10%
Benchmarking 3 2.10%
D4RL 2 1.40%

Categories