Twin Delayed Deep Deterministic

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing (which is similar to a SARSA based update; a safer update, as they provide higher value to actions resistant to perturbations).

Source: Addressing Function Approximation Error in Actor-Critic Methods

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	58	40.56%
Continuous Control	26	18.18%
OpenAI Gym	8	5.59%
Decision Making	7	4.90%
Autonomous Driving	5	3.50%
Offline RL	3	2.10%
Meta-Learning	3	2.10%
Benchmarking	3	2.10%
D4RL	2	1.40%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Clipped Double Q-learning	Off-Policy TD Control
Dense Connections	Feedforward Networks
Experience Replay	Replay Memory
ReLU	Activation Functions
Target Policy Smoothing	Regularization

Categories

Add Remove

Policy Gradient Methods