Clipped Double Q-learning

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

Clipped Double Q-learning is a variant on Double Q-learning that upper-bounds the less biased Q estimate $Q_{\theta_{2}}$ by the biased estimate $Q_{\theta_{1}}$. This is equivalent to taking the minimum of the two estimates, resulting in the following target update:

$$ y_{1} = r + \gamma\min_{i=1,2}Q_{\theta'_{i}}\left(s', \pi_{\phi_{1}}\left(s'\right)\right) $$

The motivation for this extension is that vanilla double Q-learning is sometimes ineffective if the target and current networks are too similar, e.g. with a slow-changing policy in an actor-critic framework.

Source: Addressing Function Approximation Error in Actor-Critic Methods

Latest Papers

PAPER DATE
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
| Xiao-Yang LiuHongyang YangQian ChenRunjia ZhangLiuqing YangBowen XiaoChristina Dan Wang
2020-11-19
Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
| Fabio Pardo
2020-11-15
RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning
Rinu BoneyJussi SainioMikko KaivolaArno SolinJuho Kannala
2020-11-05
Hindsight Experience Replay with Kronecker Product Approximate Curvature
Dhuruva Priyan G MAbhik SinglaShalabh Bhatnagar
2020-10-09
Sample-Efficient Automated Deep Reinforcement Learning
Jörg K. H. FrankeGregor KöhlerAndré BiedenkappFrank Hutter
2020-09-03
Collision Avoidance Robotics Via Meta-Learning (CARML)
Abhiram IyerAravind Mahadevan
2020-07-16
Regularly Updated Deterministic Policy Gradient Algorithm
Shuai HanWenbo ZhouShuai LüJiayu Yu
2020-07-01
Noise, overestimation and exploration in Deep Reinforcement Learning
Rafael Stekolshchik
2020-06-25
Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient
Qiang HeXinwen Hou
2020-06-18
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
| Antonin RaffinFreek Stulp
2020-05-12
PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning
Guillaume MatheronNicolas PerrinOlivier Sigaud
2020-04-24
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong ZhangBo LiuShimon Whiteson
2020-04-22
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
Wei ZhouYiying LiYongxin YangHuaimin WangTimothy M. Hospedales
2020-03-11
Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
| Jianyu ChenShengbo Eben LiMasayoshi Tomizuka
2020-01-23
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Vibhavari DasagiJake BruceThierry PeynotJürgen Leitner
2019-10-09
Composite Q-learning: Multi-scale Q-function Decomposition and Separable Optimization
Gabriel KalweitMaria HuegleJoschka Boedecker
2019-09-30
Proximal Distilled Evolutionary Reinforcement Learning
Cristian BodnarBen DayPietro Lió
2019-06-24
Exploring Model-based Planning with Policy Networks
| Tingwu WangJimmy Ba
2019-06-20
Collaborative Evolutionary Reinforcement Learning
| Shauharda KhadkaSomdeb MajumdarTarek NassarZach DwielEvren TumerSantiago MiretYinyin LiuKagan Tumer
2019-05-02
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Aditya BhattMax ArgusArtemij AmiranashviliThomas Brox
2019-02-14
Addressing Function Approximation Error in Actor-Critic Methods
| Scott FujimotoHerke van HoofDavid Meger
2018-02-26

Tasks

TASK PAPERS SHARE
Continuous Control 7 53.85%
Meta-Learning 2 15.38%
Efficient Exploration 1 7.69%
Motion Planning 1 7.69%
Autonomous Driving 1 7.69%
Hypothesis Testing 1 7.69%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories