TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Atari Games	Atari 2600 Gravitar	RND	Score	3906	# 11
Atari Games	Atari 2600 Montezuma's Revenge	RND	Score	8152	# 5
Atari Games	Atari 2600 Pitfall!	RND	Score	-3	# 20
Atari Games	Atari 2600 Private Eye	RND	Score	8666	# 11
Atari Games	Atari 2600 Solaris	RND	Score	3282	# 17
Atari Games	Atari 2600 Venture	RND	Score	1859	# 9
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	APT	Walker (mean normalized return)	7.71±7.39	# 7
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	APT	Quadruped (mean normalized return)	21.22±5.14	# 7
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	APT	Jaco (mean normalized return)	0.37±0.64	# 9
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	RND	Walker (mean normalized return)	23.87±10.21	# 3
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	RND	Quadruped (mean normalized return)	24.37±8.70	# 4
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	RND	Jaco (mean normalized return)	26.22±4.83	# 1
Unsupervised Reinforcement Learning	URLB (pixels, 10^6 frames)	RND	Walker (mean normalized return)	30.46±14.18	# 4
Unsupervised Reinforcement Learning	URLB (pixels, 10^6 frames)	RND	Quadruped (mean normalized return)	41.89±11.72	# 1
Unsupervised Reinforcement Learning	URLB (pixels, 10^6 frames)	RND	Jaco (mean normalized return)	24.38±3.92	# 3
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	RND	Walker (mean normalized return)	32.80±13.19	# 3
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	RND	Quadruped (mean normalized return)	42.57±11.65	# 2
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	RND	Jaco (mean normalized return)	27.51±7.12	# 4
Unsupervised Reinforcement Learning	URLB (pixels, 5*10^5 frames)	RND	Walker (mean normalized return)	25.44±9.92	# 4
Unsupervised Reinforcement Learning	URLB (pixels, 5*10^5 frames)	RND	Quadruped (mean normalized return)	36.02±10.27	# 1
Unsupervised Reinforcement Learning	URLB (pixels, 5*10^5 frames)	RND	Jaco (mean normalized return)	26.62±2.75	# 2
Unsupervised Reinforcement Learning	URLB (states, 10^5 frames)	RND	Walker (mean normalized return)	82.57±31.22	# 2
Unsupervised Reinforcement Learning	URLB (states, 10^5 frames)	RND	Quadruped (mean normalized return)	35.34±11.16	# 3
Unsupervised Reinforcement Learning	URLB (states, 10^5 frames)	RND	Jaco (mean normalized return)	72.84±6.87	# 2
Unsupervised Reinforcement Learning	URLB (states, 10^6 frames)	RND	Walker (mean normalized return)	84.93±29.64	# 1
Unsupervised Reinforcement Learning	URLB (states, 10^6 frames)	RND	Quadruped (mean normalized return)	69.12±11.95	# 2
Unsupervised Reinforcement Learning	URLB (states, 10^6 frames)	RND	Jaco (mean normalized return)	60.68±8.49	# 4
Unsupervised Reinforcement Learning	URLB (states, 2*10^6 frames)	RND	Walker (mean normalized return)	79.28±30.91	# 1
Unsupervised Reinforcement Learning	URLB (states, 2*10^6 frames)	RND	Quadruped (mean normalized return)	75.14±16.23	# 2
Unsupervised Reinforcement Learning	URLB (states, 2*10^6 frames)	RND	Jaco (mean normalized return)	56.05±8.73	# 4
Unsupervised Reinforcement Learning	URLB (states, 5*10^5 frames)	RND	Walker (mean normalized return)	87.15±27.65	# 1
Unsupervised Reinforcement Learning	URLB (states, 5*10^5 frames)	RND	Quadruped (mean normalized return)	59.90±12.95	# 1
Unsupervised Reinforcement Learning	URLB (states, 5*10^5 frames)	RND	Jaco (mean normalized return)	65.08±5.45	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-2)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-2?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-3)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-3?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-1)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-1?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-4)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-4?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-7)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-7?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-6)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-6?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/unsupervised-reinforcement-learning-on-urlb-5)](https://paperswithcode.com/sota/unsupervised-reinforcement-learning-on-urlb-5?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/atari-games-on-atari-2600-montezumas-revenge)](https://paperswithcode.com/sota/atari-games-on-atari-2600-montezumas-revenge?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/atari-games-on-atari-2600-venture)](https://paperswithcode.com/sota/atari-games-on-atari-2600-venture?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/atari-games-on-atari-2600-gravitar)](https://paperswithcode.com/sota/atari-games-on-atari-2600-gravitar?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/atari-games-on-atari-2600-private-eye)](https://paperswithcode.com/sota/atari-games-on-atari-2600-private-eye?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/atari-games-on-atari-2600-solaris)](https://paperswithcode.com/sota/atari-games-on-atari-2600-solaris?p=exploration-by-random-network-distillation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-by-random-network-distillation/atari-games-on-atari-2600-pitfall)](https://paperswithcode.com/sota/atari-games-on-atari-2600-pitfall?p=exploration-by-random-network-distillation)`

Exploration by Random Network Distillation

ICLR 2019 · Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov ·

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract

Code

Add Remove Mark official

openai/random-network-distillation official

855

opendilab/DI-engine

2,539

rle-foundation/rlexplore

318

jcwleo/random-network-distillation-…

234

uoe-agents/derl

See all 21 implementations

Tasks

Add Remove

Atari Games

Montezuma's Revenge

reinforcement-learning

Reinforcement Learning (RL)

Unsupervised Reinforcement Learning

Datasets

Arcade Learning Environment URLB

Results from the Paper

Edit

Ranked #1 on Unsupervised Reinforcement Learning on URLB (states, 2*10^6 frames)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Atari Games	Atari 2600 Gravitar	RND	Score	3906	# 11	Compare
Atari Games	Atari 2600 Montezuma's Revenge	RND	Score	8152	# 5	Compare
Atari Games	Atari 2600 Pitfall!	RND	Score	-3	# 20	Compare
Atari Games	Atari 2600 Private Eye	RND	Score	8666	# 11	Compare
Atari Games	Atari 2600 Solaris	RND	Score	3282	# 17	Compare
Atari Games	Atari 2600 Venture	RND	Score	1859	# 9	Compare
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	APT	Walker (mean normalized return)	7.71±7.39	# 7	Compare
			Quadruped (mean normalized return)	21.22±5.14	# 7	Compare
			Jaco (mean normalized return)	0.37±0.64	# 9	Compare
Unsupervised Reinforcement Learning	URLB (pixels, 10^5 frames)	RND	Walker (mean normalized return)	23.87±10.21	# 3	Compare
			Quadruped (mean normalized return)	24.37±8.70	# 4	Compare
			Jaco (mean normalized return)	26.22±4.83	# 1	Compare
Unsupervised Reinforcement Learning	URLB (pixels, 10^6 frames)	RND	Walker (mean normalized return)	30.46±14.18	# 4	Compare
			Quadruped (mean normalized return)	41.89±11.72	# 1	Compare
			Jaco (mean normalized return)	24.38±3.92	# 3	Compare
Unsupervised Reinforcement Learning	URLB (pixels, 2*10^6 frames)	RND	Walker (mean normalized return)	32.80±13.19	# 3	Compare
			Quadruped (mean normalized return)	42.57±11.65	# 2	Compare
			Jaco (mean normalized return)	27.51±7.12	# 4	Compare
Unsupervised Reinforcement Learning	URLB (pixels, 5*10^5 frames)	RND	Walker (mean normalized return)	25.44±9.92	# 4	Compare
			Quadruped (mean normalized return)	36.02±10.27	# 1	Compare
			Jaco (mean normalized return)	26.62±2.75	# 2	Compare
Unsupervised Reinforcement Learning	URLB (states, 10^5 frames)	RND	Walker (mean normalized return)	82.57±31.22	# 2	Compare
			Quadruped (mean normalized return)	35.34±11.16	# 3	Compare
			Jaco (mean normalized return)	72.84±6.87	# 2	Compare
Unsupervised Reinforcement Learning	URLB (states, 10^6 frames)	RND	Walker (mean normalized return)	84.93±29.64	# 1	Compare
			Quadruped (mean normalized return)	69.12±11.95	# 2	Compare
			Jaco (mean normalized return)	60.68±8.49	# 4	Compare
Unsupervised Reinforcement Learning	URLB (states, 2*10^6 frames)	RND	Walker (mean normalized return)	79.28±30.91	# 1	Compare
			Quadruped (mean normalized return)	75.14±16.23	# 2	Compare
			Jaco (mean normalized return)	56.05±8.73	# 4	Compare
Unsupervised Reinforcement Learning	URLB (states, 5*10^5 frames)	RND	Walker (mean normalized return)	87.15±27.65	# 1	Compare
			Quadruped (mean normalized return)	59.90±12.95	# 1	Compare
			Jaco (mean normalized return)	65.08±5.45	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Exploration by Random Network Distillation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove