Exploration by Random Network Distillation

30 Oct 2018Yuri Burda • Harrison Edwards • Amos Storkey • Oleg Klimov

The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods.

Full paper

Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Atari Games Atari 2600 Gravitar RND Score 3906 # 1
Atari Games Atari 2600 Montezuma's Revenge RND Score 8152 # 1
Atari Games Atari 2600 Pitfall! RND Score -3 # 2
Atari Games Atari 2600 Private Eye RND Score 8666 # 2
Atari Games Atari 2600 Solaris RND Score 3282 # 2
Atari Games Atari 2600 Venture RND Score 1859 # 1