SCAN is a dataset for grounded navigation which consists of a set of simple compositional navigation commands paired with the corresponding action sequences.
136 PAPERS • NO BENCHMARKS YET
Grounded SCAN poses a simple task, where an agent must execute action sequences based on a synthetic language instruction.
20 PAPERS • NO BENCHMARKS YET
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
18 PAPERS • 1 BENCHMARK
A new English language dataset structured for task-oriented evaluation on unseen tasks.
3 PAPERS • NO BENCHMARKS YET
The Probabilistic Context Free Grammar String Edit Task (PCFG SET) dataset is a dataset with sequence to sequence problems specifically designed to test different aspects of compositional generalisation. In particular, the dataset contains splits to test for systematicity, productivity, substitutivity, localism and overgeneralisation.
2 PAPERS • NO BENCHMARKS YET
Official dataset of Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP.
1 PAPER • NO BENCHMARKS YET
Suite of OpenAI Gym-compatible multi-agent reinforcement learning environment centered around meta-referential games to benchmark for behavioral traits pertaining to symbolic behaviours, as described in Santoro et al., 2021, "Symbolic Behaviours in Artificial Intelligence", with a primary focus on the following behavioural traits: