TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
OpenAI Gym	Ant-v4	DDPG	Average Return	1712.12	# 4
OpenAI Gym	HalfCheetah-v4	DDPG	Average Return	14934.86	# 2
OpenAI Gym	Hopper-v4	DDPG	Average Return	1290.24	# 4
OpenAI Gym	Humanoid-v4	DDPG	Average Return	139.14	# 5
Continuous Control	Lunar Lander (OpenAI Gym)	DDPG	Score	256.98±14.38	# 3
OpenAI Gym	Walker2d-v4	DDPG	Average Return	2994.54	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continuous-control-with-deep-reinforcement/openai-gym-on-halfcheetah-v4)](https://paperswithcode.com/sota/openai-gym-on-halfcheetah-v4?p=continuous-control-with-deep-reinforcement)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continuous-control-with-deep-reinforcement/continuous-control-on-lunar-lander-openai-gym)](https://paperswithcode.com/sota/continuous-control-on-lunar-lander-openai-gym?p=continuous-control-with-deep-reinforcement)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continuous-control-with-deep-reinforcement/openai-gym-on-walker2d-v4)](https://paperswithcode.com/sota/openai-gym-on-walker2d-v4?p=continuous-control-with-deep-reinforcement)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continuous-control-with-deep-reinforcement/openai-gym-on-ant-v4)](https://paperswithcode.com/sota/openai-gym-on-ant-v4?p=continuous-control-with-deep-reinforcement)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continuous-control-with-deep-reinforcement/openai-gym-on-hopper-v4)](https://paperswithcode.com/sota/openai-gym-on-hopper-v4?p=continuous-control-with-deep-reinforcement)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continuous-control-with-deep-reinforcement/openai-gym-on-humanoid-v4)](https://paperswithcode.com/sota/openai-gym-on-humanoid-v4?p=continuous-control-with-deep-reinforcement)`

Continuous control with deep reinforcement learning

9 Sep 2015 · Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra ·

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

PDF Abstract

Code

Add Remove Mark official

ray-project/ray

31,570

DLR-RM/stable-baselines3

↳ Quickstart in

Colab

8,146

hill-a/stable-baselines

↳ Quickstart in

Colab

4,069

facebookresearch/Horizon

3,531

NervanaSystems/coach

2,313

See all 159 implementations

Tasks

Add Remove

Action Detection

Continuous Control

OpenAI Gym

Q-Learning

Reinforcement Learning (RL)

Speech Emotion Recognition

Datasets

OpenAI Gym

Results from the Paper

Edit

Ranked #2 on OpenAI Gym on HalfCheetah-v4

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
OpenAI Gym	Ant-v4	DDPG	Average Return	1712.12	# 4	Compare
OpenAI Gym	HalfCheetah-v4	DDPG	Average Return	14934.86	# 2	Compare
OpenAI Gym	Hopper-v4	DDPG	Average Return	1290.24	# 4	Compare
OpenAI Gym	Humanoid-v4	DDPG	Average Return	139.14	# 5	Compare
Continuous Control	Lunar Lander (OpenAI Gym)	DDPG	Score	256.98±14.38	# 3	Compare
OpenAI Gym	Walker2d-v4	DDPG	Average Return	2994.54	# 3	Compare

Methods

Add Remove

Adam • Batch Normalization • Convolution • DDPG • Dense Connections • Experience Replay • Q-Learning • ReLU • Weight Decay

Edit Social Preview

Continuous control with deep reinforcement learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove