Clipped Double Q-learning

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

Clipped Double Q-learning is a variant on Double Q-learning that upper-bounds the less biased Q estimate $Q_{\theta_{2}}$ by the biased estimate $Q_{\theta_{1}}$. This is equivalent to taking the minimum of the two estimates, resulting in the following target update:

$$ y_{1} = r + \gamma\min_{i=1,2}Q_{\theta'_{i}}\left(s', \pi_{\phi_{1}}\left(s'\right)\right) $$

The motivation for this extension is that vanilla double Q-learning is sometimes ineffective if the target and current networks are too similar, e.g. with a slow-changing policy in an actor-critic framework.

Source: Addressing Function Approximation Error in Actor-Critic Methods

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	61	40.94%
Continuous Control	28	18.79%
OpenAI Gym	9	6.04%
Decision Making	7	4.70%
Autonomous Driving	5	3.36%
Offline RL	3	2.01%
Meta-Learning	3	2.01%
Benchmarking	3	2.01%
D4RL	2	1.34%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Off-Policy TD Control