Procgen Benchmark includes 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills.
115 PAPERS • 1 BENCHMARK
Obstacle Tower is a high fidelity, 3D, 3rd person, procedurally generated environment for reinforcement learning. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent’s ability to perform well on unseen instances of the environment.
20 PAPERS • 6 BENCHMARKS
Avalon is a benchmark for generalization in Reinforcement Learning (RL). The benchmark consists of a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This benchmark setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks.
3 PAPERS • NO BENCHMARKS YET
WildLife Documentary is an animal object detection dataset. It contains 15 documentary films that are downloaded from YouTube. The videos vary between 9 minutes to as long as 50 minutes, with resolution ranging from 360p to 1080p. A unique property of this dataset is that all videos are accompanied with subtitles that are automatically generated from speech by YouTube. The subtitles are revised manually to correct obvious spelling mistakes. All the animals in the videos are annotated, resulting in more than 4098 object tracklets of 60 different visual concepts, e.g., ‘tiger’, ‘koala’, ‘langur’, and ‘ostrich’.
The ability to jointly understand the geometry of objects and plan actions for manipulating them is crucial for intelligent agents. This ability is referred to as geometric planning. Recently, many interactive environments have been proposed to evaluate intelligent agents on various skills, however, none of them cater to the needs of geometric planning. PackIt is a virtual environment to evaluate and potentially learn the ability to do geometric planning, where an agent needs to take a sequence of actions to pack a set of objects into a box with limited space.
2 PAPERS • 1 BENCHMARK
The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.
2 PAPERS • NO BENCHMARKS YET
POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of environments and a collection of memory model baselines. The environments are all Partially Observable Markov Decision Process (POMDP) environments following the Openai Gym interface. Our environments follow a few basic tenets:
1 PAPER • 1 BENCHMARK