Clipped Double Q-learning is a variant on Double Q-learning that upper-bounds the less biased Q estimate $Q_{\theta_{2}}$ by the biased estimate $Q_{\theta_{1}}$. This is equivalent to taking the minimum of the two estimates, resulting in the following target update:
$$ y_{1} = r + \gamma\min_{i=1,2}Q_{\theta'_{i}}\left(s', \pi_{\phi_{1}}\left(s'\right)\right) $$
The motivation for this extension is that vanilla double Q-learning is sometimes ineffective if the target and current networks are too similar, e.g. with a slow-changing policy in an actor-critic framework.
Source: Addressing Function Approximation Error in Actor-Critic MethodsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 65 | 31.10% |
Reinforcement Learning | 43 | 20.57% |
Continuous Control | 31 | 14.83% |
OpenAI Gym | 10 | 4.78% |
Decision Making | 8 | 3.83% |
Autonomous Driving | 5 | 2.39% |
Offline RL | 4 | 1.91% |
Meta-Learning | 3 | 1.44% |
Benchmarking | 3 | 1.44% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |