Increasing the Action Gap: New Operators for Reinforcement Learning

15 Dec 2015  ·  Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos ·

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Atari Games Atari 2600 Alien Persistent AL Score 5699.81 # 15
Atari Games Atari 2600 Alien Advantage Learning Score 4990.91 # 16
Atari Games Atari 2600 Amidar Advantage Learning Score 1557.43 # 15
Atari Games Atari 2600 Amidar Persistent AL Score 1451.65 # 17
Atari Games Atari 2600 Assault Persistent AL Score 3304.33 # 34
Atari Games Atari 2600 Assault Advantage Learning Score 3661.51 # 31
Atari Games Atari 2600 Asterix Persistent AL Score 19564.9 # 26
Atari Games Atari 2600 Asterix Advantage Learning Score 12852.08 # 34
Atari Games Atari 2600 Asteroids Advantage Learning Score 1924.42 # 26
Atari Games Atari 2600 Asteroids Persistent AL Score 1673.52 # 28
Atari Games Atari 2600 Atlantis Persistent AL Score 1465250 # 8
Atari Games Atari 2600 Atlantis Advantage Learning Score 553591.67 # 23
Atari Games Atari 2600 Bank Heist Persistent AL Score 874.99 # 30
Atari Games Atari 2600 Bank Heist Advantage Learning Score 633.63 # 31
Atari Games Atari 2600 Battle Zone Persistent AL Score 34583.07 # 18
Atari Games Atari 2600 Battle Zone Advantage Learning Score 28789.29 # 25
Atari Games Atari 2600 Beam Rider Persistent AL Score 13145.34 # 27
Atari Games Atari 2600 Beam Rider Advantage Learning Score 10054.58 # 30
Atari Games Atari 2600 Berzerk Persistent AL Score 1328.25 # 20
Atari Games Atari 2600 Berzerk Advantage Learning Score 747.26 # 32
Atari Games Atari 2600 Bowling Persistent AL Score 71.59 # 17
Atari Games Atari 2600 Bowling Advantage Learning Score 57.41 # 24
Atari Games Atari 2600 Boxing Persistent AL Score 94.3 # 23
Atari Games Atari 2600 Boxing Advantage Learning Score 93.94 # 24
Atari Games Atari 2600 Breakout Advantage Learning Score 425.32 # 23
Atari Games Atari 2600 Breakout Persistent AL Score 431.89 # 22
Atari Games Atari 2600 Centipede Persistent AL Score 4539.55 # 33
Atari Games Atari 2600 Centipede Advantage Learning Score 4225.18 # 35
Atari Games Atari 2600 Chopper Command Persistent AL Score 5734.93 # 28
Atari Games Atari 2600 Chopper Command Advantage Learning Score 5431.36 # 29
Atari Games Atari 2600 Crazy Climber Advantage Learning Score 123410.71 # 27
Atari Games Atari 2600 Crazy Climber Persistent AL Score 130002.71 # 23
Atari Games Atari 2600 Defender Advantage Learning Score 30643.59 # 19
Atari Games Atari 2600 Defender Persistent AL Score 32038.93 # 18
Atari Games Atari 2600 Demon Attack Advantage Learning Score 27153.48 # 33
Atari Games Atari 2600 Demon Attack Persistent AL Score 70908.17 # 23
Atari Games Atari 2600 Double Dunk Advantage Learning Score -0.15 # 25
Atari Games Atari 2600 Double Dunk Persistent AL Score -2.51 # 25
Atari Games Atari 2600 Elevator Action Advantage Learning Score 27088.89 # 2
Atari Games Atari 2600 Elevator Action Persistent AL Score 29100 # 1
Atari Games Atari 2600 Enduro Persistent AL Score 1343.1 # 23
Atari Games Atari 2600 Enduro Advantage Learning Score 1252.7 # 24
Atari Games Atari 2600 Fishing Derby Advantage Learning Score 21.32 # 25
Atari Games Atari 2600 Fishing Derby Persistent AL Score 28.13 # 22
Atari Games Atari 2600 Freeway Advantage Learning Score 31.72 # 26
Atari Games Atari 2600 Freeway Persistent AL Score 32.3 # 24
Atari Games Atari 2600 Frostbite Advantage Learning Score 2305.82 # 29
Atari Games Atari 2600 Frostbite Persistent AL Score 3248.96 # 24
Atari Games Atari 2600 Gopher Persistent AL Score 10611.81 # 28
Atari Games Atari 2600 Gopher Advantage Learning Score 11912.68 # 27
Atari Games Atari 2600 Gravitar Advantage Learning Score 417.65 # 32
Atari Games Atari 2600 Gravitar Persistent AL Score 446.92 # 29
Atari Games Atari 2600 HERO Advantage Learning Score 24788.86 # 18
Atari Games Atari 2600 HERO Persistent AL Score 24175.79 # 19
Atari Games Atari 2600 Ice Hockey Persistent AL Score -0.25 # 19
Atari Games Atari 2600 Ice Hockey Advantage Learning Score -1.24 # 19
Atari Games Atari 2600 James Bond Advantage Learning Score 848.46 # 22
Atari Games Atari 2600 James Bond Persistent AL Score 772.09 # 25
Atari Games Atari 2600 Kangaroo Persistent AL Score 11478.46 # 19
Atari Games Atari 2600 Kangaroo Advantage Learning Score 10809.16 # 22
Atari Games Atari 2600 Krull Persistent AL Score 8689.81 # 22
Atari Games Atari 2600 Krull Advantage Learning Score 9548.92 # 20
Atari Games Atari 2600 Kung-Fu Master Persistent AL Score 34650.91 # 23
Atari Games Atari 2600 Kung-Fu Master Advantage Learning Score 32182.99 # 28
Atari Games Atari 2600 Montezuma's Revenge Persistent AL Score 1.72 # 37
Atari Games Atari 2600 Montezuma's Revenge Advantage Learning Score 0.42 # 38
Atari Games Atari 2600 Ms. Pacman Persistent AL Score 3917.55 # 22
Atari Games Atari 2600 Ms. Pacman Advantage Learning Score 4065.8 # 20
Atari Games Atari 2600 Name This Game Persistent AL Score 10431.33 # 29
Atari Games Atari 2600 Name This Game Advantage Learning Score 11025.26 # 25
Atari Games Atari 2600 Phoenix Persistent AL Score 14495.56 # 16
Atari Games Atari 2600 Phoenix Advantage Learning Score 22038.27 # 14
Atari Games Atari 2600 Pitfall! Advantage Learning Score 0 # 4
Atari Games Atari 2600 Pong Persistent AL Score 19.76 # 31
Atari Games Atari 2600 Pong Advantage Learning Score 19.66 # 32
Atari Games Atari 2600 Pooyan Advantage Learning Score 4801.27 # 2
Atari Games Atari 2600 Private Eye Advantage Learning Score 5276.16 # 12
Atari Games Atari 2600 Q*Bert Advantage Learning Score 14368.03 # 29
Atari Games Atari 2600 River Raid Advantage Learning Score 10585.12 # 26
Atari Games Atari 2600 Road Runner Advantage Learning Score 52351.23 # 22
Atari Games Atari 2600 Robotank Advantage Learning Score 69.31 # 11
Atari Games Atari 2600 Seaquest Persistent AL Score 13230.74 # 19
Atari Games Atari 2600 Seaquest Advantage Learning Score 8670.5 # 23
Atari Games Atari 2600 Skiing Advantage Learning Score -13264.51 # 3
Atari Games Atari 2600 Solaris Advantage Learning Score 4785.16 # 11
Atari Games Atari 2600 Space Invaders Advantage Learning Score 3460.79 # 23
Atari Games Atari 2600 Space Invaders Persistent AL Score 3277.59 # 24
Atari Games Atari 2600 Star Gunner Advantage Learning Score 61353.59 # 24
Atari Games Atari 2600 Surround Persistent AL Score 0.72 # 11
Atari Games Atari 2600 Tennis Advantage Learning Score 0 # 19
Atari Games Atari 2600 Time Pilot Advantage Learning Score 8969.12 # 24
Atari Games Atari 2600 Tutankham Advantage Learning Score 245.22 # 17
Atari Games Atari 2600 Up and Down Advantage Learning Score 13909.74 # 33
Atari Games Atari 2600 Venture Advantage Learning Score 198.69 # 25
Atari Games Atari 2600 Video Pinball Advantage Learning Score 543504 # 14
Atari Games Atari 2600 Wizard of Wor Advantage Learning Score 9541.14 # 18
Atari Games Atari 2600 Yars Revenge Advantage Learning Score 24240.03 # 15
Atari Games Atari 2600 Zaxxon Advantage Learning Score 9129.61 # 29

Methods


No methods listed for this paper. Add relevant methods here