GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

11 Jun 2021  ·  Jiajun Fan, Changnan Xiao, Yue Huang ·

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavourable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is better than GPI and how it works. Several practical algorithms based on GDI have been proposed to verify the effectiveness and extensiveness of it. Empirical experiments prove our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200M training frames. Our work aims to lead the RL research to step into the journey of conquering the human world records and seek real superhuman agents on both performance and efficiency.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Atari Games Atari 2600 Beam Rider GDI-I3 Score 162100 # 6
Atari Games Atari 2600 Berzerk GDI-I3 Score 7607 # 9
Atari Games Atari 2600 Bowling GDI-I3 Score 201.9 # 6
Atari Games Atari 2600 Boxing GDI-H3 Score 100 # 1
Atari Games Atari 2600 Centipede GDI-I3 Score 155830 # 8
Atari Games Atari 2600 Chopper Command GDI-H3 Score 999999 # 1
Atari Games Atari 2600 Crazy Climber GDI-I3 Score 201000 # 8
Atari Games Atari 2600 Defender GDI-I3 Score 893110 # 3
Atari Games Atari 2600 Demon Attack GDI-I3 Score 675530 # 2
Atari Games Atari 2600 Double Dunk GDI-H3 Score 24 # 1
Atari Games Atari 2600 Enduro GDI-I3 Score 14330 # 1
Atari Games Atari 2600 Fishing Derby GDI-I3 Score 59 # 7
Atari Games Atari 2600 Freeway GDI-I3 Score 34 # 1
Atari Games Atari 2600 Frostbite GDI-I3 Score 10485 # 9
Atari Games Atari 2600 Gravitar GDI-I3 Score 5905 # 8
Atari Games Atari 2600 HERO GDI-I3 Score 38330 # 5
Atari Games Atari 2600 Ice Hockey GDI-I3 Score 44.94 # 5
Atari Games Atari 2600 James Bond GDI-I3 Score 594500 # 2
Atari Games Atari 2600 Kangaroo GDI-I3 Score 14500 # 10
Atari Games Atari 2600 Krull GDI-I3 Score 97575 # 5
Atari Games Atari 2600 Montezuma's Revenge GDI-I3 Score 3000 # 11
Atari Games Atari 2600 Ms. Pacman GDI-I3 Score 11536 # 7
Atari Games Atari 2600 Name This Game GDI-I3 Score 34434 # 6
Atari Games Atari 2600 Phoenix GDI-I3 Score 894460 # 4
Atari Games Atari 2600 Pitfall! GDI-I3 Score 0 # 4
Atari Games Atari 2600 Private Eye GDI-I3 Score 15100 # 5
Atari Games Atari 2600 Q*Bert GDI-I3 Score 27800 # 13
Atari Games Atari 2600 Road Runner GDI-I3 Score 878600 # 2
Atari Games Atari 2600 Robotank GDI-I3 Score 108.2 # 4
Atari Games Atari 2600 Seaquest GDI-I3 Score 943910 # 7
Atari Games Atari 2600 Skiing GDI-I3 Score -6774 # 3
Atari Games Atari 2600 Solaris GDI-I3 Score 11074 # 6
Atari Games Atari 2600 Space Invaders GDI-I3 Score 140460 # 3
Atari Games Atari 2600 Star Gunner GDI-I3 Score 465750 # 5
Atari Games Atari 2600 Surround GDI-I3 Score -7.8 # 14
Atari Games Atari 2600 Tennis GDI-I3 Score 24 # 1
Atari Games Atari 2600 Time Pilot GDI-I3 Score 216770 # 6
Atari Games Atari 2600 Tutankham GDI-I3 Score 423.9 # 3
Atari Games Atari 2600 Up and Down GDI-I3 Score 986440 # 1
Atari Games Atari-57 GDI-H3(200M frames) Human World Record Breakthrough 22 # 2
Mean Human Normalized Score 9620.98% # 2

Methods