Double Qlearning is an offpolicy reinforcement learning algorithm that utilises double estimation to counteract overestimation problems with traditional Qlearning.
The max operator in standard Qlearning and DQN uses the same values both to select and to evaluate an action. This makes it more likely to select overestimated values, resulting in overoptimistic value estimates. To prevent this, we can decouple the selection from the evaluation, which is the idea behind Double Qlearning:
$$ Y^{Q}_{t} = R_{t+1} + \gamma{Q}\left(S_{t+1}, \arg\max_{a}Q\left(S_{t+1}, a; \mathbb{\theta}_{t}\right);\mathbb{\theta}_{t}\right) $$
The Double Qlearning error can then be written as:
$$ Y^{DoubleQ}_{t} = R_{t+1} + \gamma{Q}\left(S_{t+1}, \arg\max_{a}Q\left(S_{t+1}, a; \mathbb{\theta}_{t}\right);\mathbb{\theta}^{'}_{t}\right) $$
Here the selection of the action in the $\arg\max$ is still due to the online weights $\theta_{t}$. But we use a second set of weights $\mathbb{\theta}^{'}_{t}$ to fairly evaluate the value of this policy.
Source: Deep Reinforcement Learning with Double Qlearning
Source: Double QlearningPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Reinforcement Learning (RL)  66  43.42% 
Atari Games  18  11.84% 
OpenAI Gym  10  6.58% 
Decision Making  9  5.92% 
Continuous Control  7  4.61% 
Management  4  2.63% 
Multiagent Reinforcement Learning  4  2.63% 
Efficient Exploration  3  1.97% 
Ensemble Learning  2  1.32% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 