Montezuma's Revenge is an ATARI 2600 Benchmark game that is known to be difficult to perform on for reinforcement learning algorithms. Solutions typically employ algorithms that incentivise environment exploration in different ways.
For the state-of-the art tables, please consult the parent Atari Games task.
( Image credit: Q-map )
However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments.
Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma's Revenge.
SOTA for Atari Games on Atari 2600 Pitfall!
Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms.
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations.
#7 best model for Atari Games on Atari 2600 Montezuma's Revenge
One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator.
Our work is a simple extension of the paper "Exploration by Random Network Distillation".
We propose the action balance exploration method to overcome the defects of the next-state bonus methods, which balances the chosen time of each action in each state and can be treated as an extension of upper confidence bound (UCB) to deep reinforcement learning.