TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Atari Games	Atari 2600 Berzerk	Go-Explore	Score	197376	# 1
Atari Games	Atari 2600 Bowling	Go-Explore	Score	260	# 2
Atari Games	Atari 2600 Centipede	Go-Explore	Score	1422628	# 1
Atari Games	Atari 2600 Freeway	Go-Explore	Score	34	# 1
Atari Games	Atari 2600 Gravitar	Go-Explore	Score	7588	# 4
Atari Games	Atari 2600 Montezuma's Revenge	Go-Explore	Score	43791	# 1
Atari Games	Atari 2600 Pitfall!	Go-Explore	Score	6954	# 3
Atari Games	Atari 2600 Private Eye	Go-Explore	Score	95756	# 1
Atari Games	Atari 2600 Skiing	Go-Explore	Score	-3660	# 3
Atari Games	Atari 2600 Solaris	Go-Explore	Score	19671	# 2
Atari Games	Atari 2600 Venture	Go-Explore	Score	2281	# 2
Atari Games	Atari games	Go-Explore	Mean Human Normalized Score	4989.94%	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-berzerk)](https://paperswithcode.com/sota/atari-games-on-atari-2600-berzerk?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-centipede)](https://paperswithcode.com/sota/atari-games-on-atari-2600-centipede?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-freeway)](https://paperswithcode.com/sota/atari-games-on-atari-2600-freeway?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-montezumas-revenge)](https://paperswithcode.com/sota/atari-games-on-atari-2600-montezumas-revenge?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-private-eye)](https://paperswithcode.com/sota/atari-games-on-atari-2600-private-eye?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-bowling)](https://paperswithcode.com/sota/atari-games-on-atari-2600-bowling?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-solaris)](https://paperswithcode.com/sota/atari-games-on-atari-2600-solaris?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-venture)](https://paperswithcode.com/sota/atari-games-on-atari-2600-venture?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-pitfall)](https://paperswithcode.com/sota/atari-games-on-atari-2600-pitfall?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-skiing)](https://paperswithcode.com/sota/atari-games-on-atari-2600-skiing?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-2600-gravitar)](https://paperswithcode.com/sota/atari-games-on-atari-2600-gravitar?p=first-return-then-explore)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/first-return-then-explore/atari-games-on-atari-games)](https://paperswithcode.com/sota/atari-games-on-atari-games?p=first-return-then-explore)`

First return, then explore

27 Apr 2020 · Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune ·

The promise of reinforcement learning is to solve complex sequential decision problems autonomously by specifying a high-level reward function only. However, reinforcement learning algorithms struggle when, as is often the case, simple and intuitive rewards provide sparse and deceptive feedback. Avoiding these pitfalls requires thoroughly exploring the environment, but creating algorithms that can do so remains one of the central challenges of the field. We hypothesise that the main impediment to effective exploration originates from algorithms forgetting how to reach previously visited states ("detachment") and from failing to first return to a state before exploring from it ("derailment"). We introduce Go-Explore, a family of algorithms that addresses these two challenges directly through the simple principles of explicitly remembering promising states and first returning to such states before intentionally exploring. Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games, with orders of magnitude improvements on the grand challenges Montezuma's Revenge and Pitfall. We also demonstrate the practical potential of Go-Explore on a sparse-reward pick-and-place robotics task. Additionally, we show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handle stochasticity throughout training. The substantial performance gains from Go-Explore suggest that the simple principles of remembering states, returning to them, and exploring from them are a powerful and general approach to exploration, an insight that may prove critical to the creation of truly intelligent learning agents.

PDF Abstract

Code

Add Remove Mark official

uber-research/go-explore official

547

qgallouedec/lge

Tasks

Add Remove

Atari Games

Montezuma's Revenge

reinforcement-learning

Reinforcement Learning (RL)

Datasets

Arcade Learning Environment

Results from the Paper

Edit

Ranked #1 on Atari Games on Atari 2600 Montezuma's Revenge

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Atari Games	Atari 2600 Berzerk	Go-Explore	Score	197376	# 1	Compare
Atari Games	Atari 2600 Bowling	Go-Explore	Score	260	# 2	Compare
Atari Games	Atari 2600 Centipede	Go-Explore	Score	1422628	# 1	Compare
Atari Games	Atari 2600 Freeway	Go-Explore	Score	34	# 1	Compare
Atari Games	Atari 2600 Gravitar	Go-Explore	Score	7588	# 4	Compare
Atari Games	Atari 2600 Montezuma's Revenge	Go-Explore	Score	43791	# 1	Compare
Atari Games	Atari 2600 Pitfall!	Go-Explore	Score	6954	# 3	Compare
Atari Games	Atari 2600 Private Eye	Go-Explore	Score	95756	# 1	Compare
Atari Games	Atari 2600 Skiing	Go-Explore	Score	-3660	# 3	Compare
Atari Games	Atari 2600 Solaris	Go-Explore	Score	19671	# 2	Compare
Atari Games	Atari 2600 Venture	Go-Explore	Score	2281	# 2	Compare
Atari Games	Atari games	Go-Explore	Mean Human Normalized Score	4989.94%	# 4	Compare

Methods

Add Remove

Go-Explore

Edit Social Preview

First return, then explore

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove