TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Atari Games	Atari 2600 Alien	GDI-H3(1B frames)	Score	279700	# 3
Atari Games	Atari 2600 Centipede	GDI-H3(1B frames)	Score	1359533	# 2
Atari Games	Atari 2600 Kung-Fu Master	GDI-H3 (200M)	Score	1666000	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gdi-rethinking-what-makes-reinforcement-1/atari-games-on-atari-2600-centipede)](https://paperswithcode.com/sota/atari-games-on-atari-2600-centipede?p=gdi-rethinking-what-makes-reinforcement-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gdi-rethinking-what-makes-reinforcement-1/atari-games-on-atari-2600-kung-fu-master)](https://paperswithcode.com/sota/atari-games-on-atari-2600-kung-fu-master?p=gdi-rethinking-what-makes-reinforcement-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gdi-rethinking-what-makes-reinforcement-1/atari-games-on-atari-2600-alien)](https://paperswithcode.com/sota/atari-games-on-atari-2600-alien?p=gdi-rethinking-what-makes-reinforcement-1)`

GDI: Rethinking What Makes Reinforcement Learning Different from Supervised Learning

AAAI Workshop ML4OR-22 2022 · Anonymous ·

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavorable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is better than GPI and how it works. Several practical algorithms based on GDI have been proposed to verify its effectiveness and extensiveness. Empirical experiments prove our state-of the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200M training frames. Our work aims to lead the RL research to step into the journey of conquering the human world records and seek real superhuman agents on both performance and efficiency.

PDF Abstract AAAI Workshop 2022 PDF AAAI Workshop 2022 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Atari Games

reinforcement-learning

Reinforcement Learning (RL)

Datasets

Arcade Learning Environment

Results from the Paper

Add Remove

Ranked #2 on Atari Games on Atari 2600 Centipede

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Atari Games	Atari 2600 Alien	GDI-H3(1B frames)	Score	279700	# 3	Compare
Atari Games	Atari 2600 Centipede	GDI-H3(1B frames)	Score	1359533	# 2	Compare
Atari Games	Atari 2600 Kung-Fu Master	GDI-H3 (200M)	Score	1666000	# 2	Compare

Methods

Add Remove

Convolution • Dense Connections • DQN • Q-Learning

Edit Social Preview

GDI: Rethinking What Makes Reinforcement Learning Different from Supervised Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove