Double DQN

Introduced by Hasselt et al. in Deep Reinforcement Learning with Double Q-learning

A Double Deep Q-Network, or Double DQN utilises Double Q-learning to reduce overestimation by decomposing the max operation in the target into action selection and action evaluation. We evaluate the greedy policy according to the online network, but we use the target network to estimate its value. The update is the same as for DQN, but replacing the target $Y^{DQN}_{t}$ with:

$$ Y^{DoubleDQN}_{t} = R_{t+1}+\gamma{Q}\left(S_{t+1}, \arg\max_{a}Q\left(S_{t+1}, a; \theta_{t}\right);\theta_{t}^{-}\right) $$

Compared to the original formulation of Double Q-Learning, in Double DQN the weights of the second network $\theta^{'}_{t}$ are replaced with the weights of the target network $\theta_{t}^{-}$ for the evaluation of the current greedy policy.

Source: Deep Reinforcement Learning with Double Q-learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	26	41.94%
Atari Games	8	12.90%
OpenAI Gym	5	8.06%
Decision Making	3	4.84%
Algorithmic Trading	1	1.61%
Collision Avoidance	1	1.61%
Explainable Artificial Intelligence (XAI)	1	1.61%
Robot Navigation	1	1.61%
Model-based Reinforcement Learning	1	1.61%