Mean in-game score over 1000 episodes with random seeds not seen during training. See (Section 2.4 Evaluation Protocol) for details.


BeBold: Exploration Beyond the Boundary of Explored Regions

In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR.

Curriculum Learning Efficient Exploration +1

The NetHack Learning Environment

Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack.

NetHack Score Systematic Generalization