Browse > Playing Games > Atari Games > Montezuma's Revenge

Montezuma's Revenge

5 papers with code · Playing Games
Subtask of Atari Games

Montezuma's Revenge is an ATARI 2600 Benchmark game that is known to be difficult to perform on for reinforcement learning algorithms. Solutions typically employ algorithms that incentivise environment exploration in different ways.

For the state-of-the art tables, please consult the parent Atari Games task.

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers with code

Go-Explore: a New Approach for Hard-Exploration Problems

30 Jan 2019uber-research/go-explore

Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma's Revenge. On Pitfall, Go-Explore with domain knowledge is the first algorithm to score above zero.


30 Jan 2019

Exploration by Random Network Distillation

30 Oct 2018lgerrets/rl18-curiosity

The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods.


30 Oct 2018

Empowerment-driven Exploration using Mutual Information Estimation

11 Oct 2018navneet-nmk/pytorch-rl

However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. We demonstrate that an empowerment driven agent is able to improve significantly the score of a baseline DQN agent on the game of Montezuma's Revenge.


Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning

ICLR 2019 yl3829/Q-map

We show how this network can be efficiently trained with a 3D variant of Q-learning to update the estimates towards all goals at once. While the Q-map agent could be used for a wide range of applications, we propose a novel exploration mechanism in place of epsilon-greedy that relies on goal selection at a desired distance followed by several steps taken towards it, allowing long and coherent exploratory steps in the environment.


06 Oct 2018

Playing hard exploration games by watching YouTube

NeurIPS 2018 MaxSobolMark/HardRLWithYoutube

One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. and Private Eye for the first time, even if the agent is not presented with any environment rewards.


29 May 2018