The task is to train an agent to play SNES games such as Super Mario.
However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite.
Mastering a video game requires skill, tactics and strategy. The environment is expandable, allowing for more video games and consoles to be easily added to the environment, while maintaining the same interface as ALE.
Generative Adversarial Networks (GANs) are a machine learning approach capable of generating novel example outputs across a space of provided training examples. This paper trains a GAN to generate levels for Super Mario Bros using a level from the Video Game Level Corpus.
We show how this network can be efficiently trained with a 3D variant of Q-learning to update the estimates towards all goals at once. While the Q-map agent could be used for a wide range of applications, we propose a novel exploration mechanism in place of epsilon-greedy that relies on goal selection at a desired distance followed by several steps taken towards it, allowing long and coherent exploratory steps in the environment.