Double Q-learning

Introduced by Hasselt in Double Q-learning

Double Q-learning is an off-policy reinforcement learning algorithm that utilises double estimation to counteract overestimation problems with traditional Q-learning.

The max operator in standard Q-learning and DQN uses the same values both to select and to evaluate an action. This makes it more likely to select overestimated values, resulting in overoptimistic value estimates. To prevent this, we can decouple the selection from the evaluation, which is the idea behind Double Q-learning:

$$ Y^{Q}_{t} = R_{t+1} + \gamma{Q}\left(S_{t+1}, \arg\max_{a}Q\left(S_{t+1}, a; \mathbb{\theta}_{t}\right)\mathbb{\theta}_{t}\right) $$

The Double Q-learning error can then be written as:

$$ Y^{DoubleQ}_{t} = R_{t+1} + \gamma{Q}\left(S_{t+1}, \arg\max_{a}Q\left(S_{t+1}, a; \mathbb{\theta}_{t}\right)\mathbb{\theta}^{'}_{t}\right) $$

Here the selection of the action in the $\arg\max$ is still due to the online weights $\theta_{t}$. But we use a second set of weights $\mathbb{\theta}^{'}_{t}$ to fairly evaluate the value of this policy.

Source: Deep Reinforcement Learning with Double Q-learning

Source: Double Q-learning

Latest Papers

PAPER DATE
Reinforcement Learning with Quantum Variational Circuits
Owen LockwoodMei Si
2020-08-15
Chrome Dino Run using Reinforcement Learning
Divyanshu MarwahSneha SrivastavaAnusha GuptaShruti Verma
2020-08-15
QPLEX: Duplex Dueling Multi-Agent Q-Learning
Jianhao WangZhizhou RenTerry LiuYang YuChongjie Zhang
2020-08-03
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
| Kimin LeeMichael LaskinAravind SrinivasPieter Abbeel
2020-07-09
Provably-Efficient Double Q-Learning
Wentao WengHarsh GuptaNiao HeLei YingR. Srikant
2020-07-09
Regularly Updated Deterministic Policy Gradient Algorithm
Shuai HanWenbo ZhouShuai LüJiayu Yu
2020-07-01
Noise, overestimation and exploration in Deep Reinforcement Learning
Rafael Stekolshchik
2020-06-25
Deep Reinforcement Learning Control for Radar Detection and Tracking in Congested Spectral Environments
Charles E. ThorntonMark A. KozyR. Michael BuehrerAnthony F. MartoneKelly D. Sherbondy
2020-06-23
Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework
Amber SrivastavaSrinivasa M Salapaka
2020-06-17
Decorrelated Double Q-learning
Gang Chen
2020-06-12
Balancing a CartPole System with Reinforcement Learning -- A Tutorial
Swagat Kumar
2020-06-08
Acme: A Research Framework for Distributed Reinforcement Learning
| Matt HoffmanBobak ShahriariJohn AslanidesGabriel Barth-MaronFeryal BehbahaniTamara NormanAbbas AbdolmalekiAlbin CassirerFan YangKate BaumliSarah HendersonAlex NovikovSergio Gómez ColmenarejoSerkan CabiCaglar GulcehreTom Le PaineAndrew CowieZiyu WangBilal PiotNando de Freitas
2020-06-01
Basal Glucose Control in Type 1 Diabetes using Deep Reinforcement Learning: An In Silico Validation
Taiyu ZhuKezhi LiPau HerreroPantelis Georgiou
2020-05-18
A Double Q-Learning Approach for Navigation of Aerial Vehicles with Connectivity Constraint
Behzad KhamidehiElvino S. Sousa
2020-02-24
Disentangling Controllable Object through Video Prediction Improves Visual Reinforcement Learning
Yuanyi ZhongAlexander SchwingJian Peng
2020-02-21
Fast Reinforcement Learning for Anti-jamming Communications
Pei-Gen YeYuan-Gen WangJin LiLiang Xiao
2020-02-13
$γ$-Regret for Non-Episodic Reinforcement Learning
Shuang LiuHao Su
2020-02-12
Dynamically Balanced Value Estimates for Actor-Critic Methods
Anonymous
2020-01-01
Do recent advancements in model-based deep reinforcement learning really improve data efficiency?
Anonymous
2020-01-01
Exploiting the potential of deep reinforcement learning for classification tasks in high-dimensional and unstructured data
Johan S. Obando-CeronVictor Romero CanoWalter Mayor Toro
2019-12-20
Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order
| Vladislav KurenkovBulat MaksudovAdil Khan
2019-10-27
Reverse Experience Replay
Egor Rotinov
2019-10-19
To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies
Dirk V{\"a}thNgoc Thang Vu
2019-09-01
Performing Deep Recurrent Double Q-Learning for Atari Games
Felipe Moreno-Vera
2019-08-16
Large-scale Traffic Signal Control Using a Novel Multi-Agent Reinforcement Learning
Xiaoqiang WangLiangjun KeZhimin QiaoXinghua Chai
2019-08-10
A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry
| Baihan LinGuillermo CecchiDjallel BouneffoufJenna ReinenIrina Rish
2019-06-21
Generative Adversarial Imagination for Sample Efficient Deep Reinforcement Learning
Kacper Kielak
2019-04-30
Double Deep Q-Learning for Optimal Execution
Brian NingFranco Ho Ting LinSebastian Jaimungal
2018-12-17
Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
| Zhao SongRonald E. ParrLawrence Carin
2018-12-02
Macro action selection with deep reinforcement learning in StarCraft
| Sijia XuHongyu KuangZhi ZhuangRenjie HuYang LiuHuyang Sun
2018-12-02
Distributed Prioritized Experience Replay
| Dan HorganJohn QuanDavid BuddenGabriel Barth-MaronMatteo HesselHado van HasseltDavid Silver
2018-03-02
Addressing Function Approximation Error in Actor-Critic Methods
| Scott FujimotoHerke van HoofDavid Meger
2018-02-26
Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments
Yan ZhengJianye HaoZongzhang Zhang
2018-02-23
Efficient Exploration through Bayesian Deep Q-Networks
| Kamyar AzizzadenesheliAnimashree Anandkumar
2018-02-13
Faster Deep Q-learning using Neural Episodic Control
Daichi NishioSatoshi Yamane
2018-01-06
Rainbow: Combining Improvements in Deep Reinforcement Learning
| Matteo HesselJoseph ModayilHado van HasseltTom SchaulGeorg OstrovskiWill DabneyDan HorganBilal PiotMohammad AzarDavid Silver
2017-10-06
Noisy Networks for Exploration
| Meire FortunatoMohammad Gheshlaghi AzarBilal PiotJacob MenickIan OsbandAlex GravesVlad MnihRemi MunosDemis HassabisOlivier PietquinCharles BlundellShane Legg
2017-06-30
Sample Efficient Actor-Critic with Experience Replay
| Ziyu WangVictor BapstNicolas HeessVolodymyr MnihRemi MunosKoray KavukcuogluNando de Freitas
2016-11-03
Dynamic Frame skip Deep Q Network
Aravind S. LakshminarayananSahil SharmaBalaraman Ravindran
2016-05-17
Dueling Network Architectures for Deep Reinforcement Learning
| Ziyu WangTom SchaulMatteo HesselHado van HasseltMarc LanctotNando de Freitas
2015-11-20
Deep Reinforcement Learning with Double Q-learning
| Hado van HasseltArthur GuezDavid Silver
2015-09-22
Double Q-learning
Hado V. Hasselt
2010-12-01

Tasks

TASK PAPERS SHARE
Atari Games 9 32.14%
Efficient Exploration 3 10.71%
Multi-agent Reinforcement Learning 2 7.14%
Starcraft 2 7.14%
Decision Making 2 7.14%
Continuous Control 2 7.14%
Starcraft II 1 3.57%
DQN Replay Dataset 1 3.57%
Offline RL 1 3.57%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories