Procgen Benchmark includes 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills.
152 PAPERS • 1 BENCHMARK
ManiSkill2 is the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. It includes 20 manipulation task families with 2000+ object models and 4M+ demonstration frames, which cover stationary/mobile-base, single/dual-arm, and rigid/soft-body manipulation tasks with 2D/3D input data simulated by fully dynamic engines.
33 PAPERS • NO BENCHMARKS YET
PRM800K is a process supervision dataset containing 800,000 step-level correctness labels for model-generated solutions to problems from the MATH dataset.
22 PAPERS • NO BENCHMARKS YET
SMACv2 (StarCraft Multi-Agent Challenge v2) is a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation.
16 PAPERS • NO BENCHMARKS YET
V-D4RL provides pixel-based analogues of the popular D4RL benchmarking tasks, derived from the dm_control suite, along with natural extensions of two state-of-the-art online pixel-based continuous control algorithms, DrQ-v2 and DreamerV2, to the offline setting.
11 PAPERS • NO BENCHMARKS YET
QDax is a benchmark suite designed for for Deep Neuroevolution in Reinforcement Learning domains for robot control. The suite includes the definition of tasks, environments, behavioral descriptors, and fitness. It specify different benchmarks based on the complexity of both the task and the agent controlled by a deep neural network. The benchmark uses standard Quality-Diversity metrics, including coverage, QD-score, maximum fitness, and an archive profile metric to quantify the relation between coverage and fitness.
5 PAPERS • NO BENCHMARKS YET
Avalon is a benchmark for generalization in Reinforcement Learning (RL). The benchmark consists of a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This benchmark setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks.
3 PAPERS • NO BENCHMARKS YET
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
FinRL-Meta is universe of market environments for data-driven financial reinforcement learning. It follows the de facto standard of OpenAI Gym and the lean principle of software development. It has the following unique features of layered structure and extensibility, training-testing-trading pipeline and plug-and-play mode.
MIDGARD is an open-source simulator for autonomous robot navigation in outdoor unstructured environments. It is designed to enable the training of autonomous agents (e.g., unmanned ground vehicles) in photorealistic 3D environments, and support the generalization skills of learning-based agents thanks to the variability in training scenarios.
2 PAPERS • NO BENCHMARKS YET
POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of environments and a collection of memory model baselines. The environments are all Partially Observable Markov Decision Process (POMDP) environments following the Openai Gym interface. Our environments follow a few basic tenets:
2 PAPERS • 1 BENCHMARK
PushWorld is an environment with simplistic physics that requires manipulation planning with both movable obstacles and tools. It contains more than 200 PushWorld puzzles in PDDL and in an OpenAI Gym environment.
1 PAPER • NO BENCHMARKS YET
The Room environment - v1
1 PAPER • 1 BENCHMARK
The Room environment - v2
lilGym is a benchmark for language-conditioned reinforcement learning in visual environment based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. Each statement is paired with multiple start states and reward functions to form thousands of distinct Markov Decision Processes of varying difficulty.