Twin Delayed Deep Deterministic

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing (which is similar to a SARSA based update; a safer update, as they provide higher value to actions resistant to perturbations).

Source: Addressing Function Approximation Error in Actor-Critic Methods

Latest Papers

PAPER DATE
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
Xiao-Yang LiuHongyang YangQian ChenRunjia ZhangLiuqing YangBowen XiaoChristina Dan Wang
2020-11-19
Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
Fabio Pardo
2020-11-15
RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning
Rinu BoneyJussi SainioMikko KaivolaArno SolinJuho Kannala
2020-11-05
Hindsight Experience Replay with Kronecker Product Approximate Curvature
Dhuruva Priyan G MAbhik SinglaShalabh Bhatnagar
2020-10-09
Sample-Efficient Automated Deep Reinforcement Learning
Jörg K. H. FrankeGregor KöhlerAndré BiedenkappFrank Hutter
2020-09-03
Collision Avoidance Robotics Via Meta-Learning (CARML)
Abhiram IyerAravind Mahadevan
2020-07-16
Noise, overestimation and exploration in Deep Reinforcement Learning
Rafael Stekolshchik
2020-06-25
Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient
Qiang HeXinwen Hou
2020-06-18
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
| Antonin RaffinFreek Stulp
2020-05-12
PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning
Guillaume MatheronNicolas PerrinOlivier Sigaud
2020-04-24
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong ZhangBo LiuShimon Whiteson
2020-04-22
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
Wei ZhouYiying LiYongxin YangHuaimin WangTimothy M. Hospedales
2020-03-11
Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
| Jianyu ChenShengbo Eben LiMasayoshi Tomizuka
2020-01-23
Dynamically Balanced Value Estimates for Actor-Critic Methods
Anonymous
2020-01-01
CrossNorm: On Normalization for Off-Policy Reinforcement Learning
Anonymous
2020-01-01
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Vibhavari DasagiJake BruceThierry PeynotJürgen Leitner
2019-10-09
Off-policy Multi-step Q-learning
Gabriel KalweitMaria HuegleJoschka Boedecker
2019-09-30
Proximal Distilled Evolutionary Reinforcement Learning
Cristian BodnarBen DayPietro Lió
2019-06-24
Exploring Model-based Planning with Policy Networks
| Tingwu WangJimmy Ba
2019-06-20
Collaborative Evolutionary Reinforcement Learning
| Shauharda KhadkaSomdeb MajumdarTarek NassarZach DwielEvren TumerSantiago MiretYinyin LiuKagan Tumer
2019-05-02
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Aditya BhattMax ArgusArtemij AmiranashviliThomas Brox
2019-02-14
Addressing Function Approximation Error in Actor-Critic Methods
| Scott FujimotoHerke van HoofDavid Meger
2018-02-26

Tasks

TASK PAPERS SHARE
Continuous Control 7 58.33%
Meta-Learning 2 16.67%
Efficient Exploration 1 8.33%
Motion Planning 1 8.33%
Autonomous Driving 1 8.33%

Categories