TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Continuous Control	Lunar Lander (OpenAI Gym)	TD3	Score	277.26±4.17	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/addressing-function-approximation-error-in/continuous-control-on-lunar-lander-openai-gym)](https://paperswithcode.com/sota/continuous-control-on-lunar-lander-openai-gym?p=addressing-function-approximation-error-in)`

Addressing Function Approximation Error in Actor-Critic Methods

ICML 2018 · Scott Fujimoto, Herke van Hoof, David Meger ·

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.