Montezuma's Revenge is an ATARI 2600 Benchmark game that is known to be difficult to perform on for reinforcement learning algorithms. Solutions typically employ algorithms that incentivise environment exploration in different ways.
For the state-of-the art tables, please consult the parent Atari Games task.
Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma's Revenge. On Pitfall, Go-Explore with domain knowledge is the first algorithm to score above zero.
The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods.
However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. We demonstrate that an empowerment driven agent is able to improve significantly the score of a baseline DQN agent on the game of Montezuma's Revenge.
We show how this network can be efficiently trained with a 3D variant of Q-learning to update the estimates towards all goals at once. While the Q-map agent could be used for a wide range of applications, we propose a novel exploration mechanism in place of epsilon-greedy that relies on goal selection at a desired distance followed by several steps taken towards it, allowing long and coherent exploratory steps in the environment.
One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. and Private Eye for the first time, even if the agent is not presented with any environment rewards.