TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Atari Games	Atari 2600 Freeway	TRPO-hash	Score	34.0	# 1
Atari Games	Atari 2600 Frostbite	TRPO-hash	Score	5214.0	# 15
Atari Games	Atari 2600 Montezuma's Revenge	TRPO-hash	Score	75	# 29
Atari Games	Atari 2600 Venture	TRPO-hash	Score	445.0	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-a-study-of-count-based/atari-games-on-atari-2600-freeway)](https://paperswithcode.com/sota/atari-games-on-atari-2600-freeway?p=exploration-a-study-of-count-based)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-a-study-of-count-based/atari-games-on-atari-2600-frostbite)](https://paperswithcode.com/sota/atari-games-on-atari-2600-frostbite?p=exploration-a-study-of-count-based)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-a-study-of-count-based/atari-games-on-atari-2600-venture)](https://paperswithcode.com/sota/atari-games-on-atari-2600-venture?p=exploration-a-study-of-count-based)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploration-a-study-of-count-based/atari-games-on-atari-2600-montezumas-revenge)](https://paperswithcode.com/sota/atari-games-on-atari-2600-montezumas-revenge?p=exploration-a-study-of-count-based)`

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

NeurIPS 2017 · Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel ·

Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Code

Add Remove Mark official

nhynes/abc

uoe-agents/derl

clementbernardd/Count-Based-Explora…

Tasks

Add Remove

Atari Games

Continuous Control

reinforcement-learning

Reinforcement Learning (RL)

Datasets

Arcade Learning Environment

Results from the Paper

Add Remove

Ranked #1 on Atari Games on Atari 2600 Freeway

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Atari Games	Atari 2600 Freeway	TRPO-hash	Score	34.0	# 1	Compare
Atari Games	Atari 2600 Frostbite	TRPO-hash	Score	5214.0	# 15	Compare
Atari Games	Atari 2600 Montezuma's Revenge	TRPO-hash	Score	75	# 29	Compare
Atari Games	Atari 2600 Venture	TRPO-hash	Score	445.0	# 23	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove