Policy Gradient Methods

Twin Delayed Deep Deterministic

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing (which is similar to a SARSA based update; a safer update, as they provide higher value to actions resistant to perturbations).

Source: Addressing Function Approximation Error in Actor-Critic Methods

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Reinforcement Learning (RL) 64 23.36%
Deep Reinforcement Learning 53 19.34%
Reinforcement Learning 47 17.15%
Continuous Control 29 10.58%
OpenAI Gym 9 3.28%
Decision Making 8 2.92%
Autonomous Driving 6 2.19%
Offline RL 5 1.82%
Meta-Learning 4 1.46%

Categories