OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It includes environment such as Algorithmic, Atari, Box2D, Classic Control, MuJoCo, Robotics, and Toy Text.
572 PAPERS • 3 BENCHMARKS
MuJoCo (multi-joint dynamics with contact) is a physics engine used to implement environments to benchmark Reinforcement Learning methods.
566 PAPERS • 2 BENCHMARKS
CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
246 PAPERS • NO BENCHMARKS YET
The Arcade Learning Environment (ALE) is an object-oriented framework that allows researchers to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design.
230 PAPERS • 56 BENCHMARKS
The DeepMind Control Suite (DMCS) is a set of simulated continuous control environments with a standardized structure and interpretable rewards. The tasks are written and powered by the MuJoCo physics engine, making them easy to identify. Control Suite tasks include Pendulum, Acrobot, Cart-pole, Cart-k-pole, Ball in cup, Point-mass, Reacher, Finger, Hooper, Fish, Cheetah, Walker, Manipulator, Manipulator extra, Stacker, Swimmer, Humanoid, Humanoid_CMU and LQR.
93 PAPERS • 3 BENCHMARKS
ViZDoom is an AI research platform based on the classical First Person Shooter game Doom. The most popular game mode is probably the so-called Death Match, where several players join in a maze and fight against each other. After a fixed time, the match ends and all the players are ranked by the FRAG scores defined as kills minus suicides. During the game, each player can access various observations, including the first-person view screen pixels, the corresponding depth-map and segmentation-map (pixel-wise object labels), the bird-view maze map, etc. The valid actions include almost all the keyboard-stroke and mouse-control a human player can take, accounting for moving, turning, jumping, shooting, changing weapon, etc. ViZDoom can run a game either synchronously or asynchronously, indicating whether the game core waits until all players’ actions are collected or runs in a constant frame rate without waiting.
87 PAPERS • 1 BENCHMARK
AirSim is a simulator for drones, cars and more, built on Unreal Engine. It is open-source, cross platform, and supports software-in-the-loop simulation with popular flight controllers such as PX4 & ArduPilot and hardware-in-loop with PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped into any Unreal environment. Similarly, there exists an experimental version for a Unity plugin.
73 PAPERS • NO BENCHMARKS YET
TORCS (The Open Racing Car Simulator) is a driving simulator. It is capable of simulating the essential elements of vehicular dynamics such as mass, rotational inertia, collision, mechanics of suspensions, links and differentials, friction and aerodynamics. Physics simulation is simplified and is carried out through Euler integration of differential equations at a temporal discretization level of 0.002 seconds. The rendering pipeline is lightweight and based on OpenGL that can be turned off for faster training. TORCS offers a large variety of tracks and cars as free assets. It also provides a number of programmed robot cars with different levels of performance that can be used to benchmark the performance of human players and software driving agents. TORCS was built with the goal of developing Artificial Intelligence for vehicular control and has been used extensively by the machine learning community ever since its inception.
60 PAPERS • NO BENCHMARKS YET
AI2-Thor is an interactive environment for embodied AI. It contains four types of scenes, including kitchen, living room, bedroom and bathroom, and each scene includes 30 rooms, where each room is unique in terms of furniture placement and item types. There are over 2000 unique objects for AI agents to interact with.
50 PAPERS • NO BENCHMARKS YET
D4RL is a collection of environments for offline reinforcement learning. These environments include Maze2D, AntMaze, Adroit, Gym, Flow, FrankKitchen and CARLA.
28 PAPERS • NO BENCHMARKS YET
The General Video Game AI (GVGAI) framework is widely used in research which features a corpus of over 100 single-player games and 60 two-player games. These are fairly small games, each focusing on specific mechanics or skills the players should be able to demonstrate, including clones of classic arcade games such as Space Invaders, puzzle games like Sokoban, adventure games like Zelda or game-theory problems such as the Iterative Prisoners Dilemma. All games are real-time and require players to make decisions in only 40ms at every game tick, although not all games explicitly reward or require fast reactions; in fact, some of the best game-playing approaches add up the time in the beginning of the game to run Breadth-First Search in puzzle games in order to find an accurate solution. However, given the large variety of games (many of which are stochastic and difficult to predict accurately), scoring systems and termination conditions, all unknown to the players, highly-adaptive genera
27 PAPERS • NO BENCHMARKS YET
HoME (Household Multimodal Environment) is a multimodal environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more.
15 PAPERS • NO BENCHMARKS YET
MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. MINOS leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites.
The StarCraft II Learning Environment (S2LE) is a reinforcement learning environment based on the game StarCraft II. The environment consists of three sub-components: a Linux StarCraft II binary, the StarCraft II API and PySC2. The StarCraft II API allows programmatic control of StarCraft II. It can be used to start a game, get observations, take actions, and review replays. PyC2 is a Python environment that wraps the StarCraft II API to ease the interaction between Python reinforcement learning agents and StarCraft II. It defines an action and observation specification, and includes a random agent and a handful of rule-based agents as examples. It also includes some mini-games as challenges and visualization tools to understand what the agent can see and do.
14 PAPERS • NO BENCHMARKS YET
Obstacle Tower is a high fidelity, 3D, 3rd person, procedurally generated environment for reinforcement learning. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent’s ability to perform well on unseen instances of the environment.
13 PAPERS • 6 BENCHMARKS
Gibson is an opensource perceptual and physics simulator to explore active and real-world perception. The Gibson Environment is used for Real-World Perception Learning.
9 PAPERS • NO BENCHMARKS YET
A rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of visually realistic houses, ranging from single-room studios to multi-storied houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song et.al.)
Jericho is a learning environment for man-made Interactive Fiction (IF) games.
Benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.
9 PAPERS • 2 BENCHMARKS
CHALET is a 3D house simulator with support for navigation and manipulation. Unlike existing systems, CHALET supports both a wide range of object manipulation, as well as supporting complex environemnt layouts consisting of multiple rooms. The range of object manipulations includes the ability to pick up and place objects, toggle the state of objects like taps or televesions, open or close containers, and insert or remove objects from these containers. In addition, the simulator comes with 58 rooms that can be combined to create houses, including 10 default house layouts. CHALET is therefore suitable for setting up challenging environments for various AI tasks that require complex language understanding and planning, such as navigation, manipulation, instruction following, and interactive question answering.
6 PAPERS • NO BENCHMARKS YET
Mario AI was a benchmark environment for reinforcement learning. The gameplay in Mario AI, as in the original Nintendo’s version, consists in moving the controlled character, namely Mario, through two-dimensional levels, which are viewed sideways. Mario can walk and run to the right and left, jump, and (depending on which state he is in) shoot fireballs. Gravity acts on Mario, making it necessary to jump over cliffs to get past them. Mario can be in one of three states: Small, Big (can kill enemies by jumping onto them), and Fire (can shoot fireballs).
5 PAPERS • NO BENCHMARKS YET
The AtariARI (Atari Annotated RAM Interface) is an environment for representation learning. The Atari Arcade Learning Environment (ALE) does not explicitly expose any ground truth state information. However, ALE does expose the RAM state (128 bytes per timestep) which are used by the game programmer to store important state information such as the location of sprites, the state of the clock, or the current room the agent is in. To extract these variables, the dataset creators consulted commented disassemblies (or source code) of Atari 2600 games which were made available by Engelhardt and Jentzsch and CPUWIZ. The dataset creators were able to find and verify important state variables for a total of 22 games. Once this information was acquired, combining it with the ALE interface produced a wrapper that can automatically output a state label for every example frame generated from the game. The dataset creators make this available with an easy-to-use gym wrapper, which returns this infor
2 PAPERS • NO BENCHMARKS YET
Griddly is an environment for grid-world based research. Griddly provides a highly optimized game state and rendering engine with a flexible high-level interface for configuring environments. Not only does Griddly offer simple interfaces for single, multi-player and RTS games, but also multiple methods of rendering, configurable partial observability and interfaces for procedural content generation.
LANI is a 3D navigation environment and corpus, where an agent navigates between landmarks. Lani contains 27,965 crowd-sourced instructions for navigation in an open environment. Each datapoint includes an instruction, a human-annotated ground-truth demonstration trajectory, and an environment with various landmarks and lakes. The dataset train/dev/test split is 19,758/4,135/4,072. Each environment specification defines placement of 6–13 landmarks within a square grass field of size 50m×50m.
The NetHack Learning Environment (NLE) is a Reinforcement Learning environment based on NetHack 3.6.6. It is designed to provide a standard reinforcement learning interface to the game, and comes with tasks that function as a first step to evaluate agents on this new environment. NetHack is one of the oldest and arguably most impactful videogames in history, as well as being one of the hardest roguelikes currently being played by humans. It is procedurally generated, rich in entities and dynamics, and overall an extremely challenging environment for current state-of-the-art RL agents, while being much cheaper to run compared to other challenging testbeds. Through NLE, the authors wish to establish NetHack as one of the next challenges for research in decision making and machine learning.
2 PAPERS • 1 BENCHMARK
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, the datasets are provided with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
To collect the 3D Vehicle Tracking Simulation Dataset, a driving simulation is used to obtain accurate 3D bounding box annotations at no cost of human efforts. The data collection and annotation pipeline extend the previous works like VIPER and FSV, especially in terms of linking identities across frames. The simulation is based on Grand Theft Auto V, a modern game that simulates a functioning city and its surroundings in a photo-realistic three-dimensional world. Note that the pipeline is real-time, providing the potential of largescale data collection, while VIPER requires expensive offline processings.
1 PAPER • NO BENCHMARKS YET
TDW is a 3D virtual world simulation platform, utilizing state-of-the-art video game engine technology. A TDW simulation consists of two components: a) the Build, a compiled executable running on the Unity3D Engine, which is responsible for image rendering, audio synthesis and physics simulations; and b) the Controller, an external Python interface to communicate with the build.
ALFWorld contains interactive TextWorld environments (Côté et. al) that parallel embodied worlds in the ALFRED dataset (Shridhar et. al). The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.
The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. It was created to test for the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation.
A dataset for flying honeybee detection introduced in "A Method for Detection of Small Moving Objects in UAV Videos".
1 PAPER • 1 BENCHMARK
Given the difficulty to handle planetary data we provide downloadable files in PNG format from the missions Chang'E-3 and Chang'E-4. In addition to a set of scripts to do the conversion given a different PDS4 Dataset.
LemgoRL is an open-source benchmark tool for traffic signal control designed to train reinforcement learning agents in a highly realistic simulation scenario with the aim to reduce Sim2Real gap. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the well-known OpenAI gym toolkit to enable easy deployment in existing research work.
MSC is a dataset for Macro-Management in StarCraft 2 based on the platfrom SC2LE. It consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. It contains 36,619 high quality replays, which are unbroken and played by relatively professional players.
MineRLis an imitation learning dataset with over 60 million frames of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.
PasticineLab is a differentiable physics benchmark, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents.
The task of Room Rearrangement consists on an agent exploring a room and recording objects' initial configurations. The agent is removed and the poses and states (e.g., open/closed) of some objects in the room are changed. The agent must restore the initial configurations of all objects in the room.
TeachMyAgent (TA) is a benchmark for Automatic Curriculum Learning (ACL) algorithms leveraging procedural task generation. It includes 1) challenge-specific unit-tests using variants of a procedural Box2D bipedal walker environment, and 2) a new procedural Parkour environment combining most ACL challenges, making it ideal for global performance assessment.
ThreeDWorld Transport Challenge is a visually-guided and physics-driven task-and-motion planning benchmark. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desired final location. Several containers are positioned around the house that can be used as tools to assist with transporting objects efficiently. To complete the task, an embodied agent must plan a sequence of actions to change the state of a large number of objects in the face of realistic physical constraints.
SynD is a synthetic energy dataset with a focus on residential buildings. This dataset is the result of a custom simulation process that relies on power traces of household appliances. The output of simulations is the power consumption of 21 household appliances as well as the household-wide consumption (i.e. mains). Therefore, SynD's can be used for Non-Intrusive Load Monitoring, also referred to as Energy Disaggregation.
0 PAPER • NO BENCHMARKS YET