KILT (KILT Benchmark)

Introduced by Petroni et al. in KILT: a Benchmark for Knowledge Intensive Language Tasks

KILT (Knowledge Intensive Language Tasks) is a benchmark consisting of 11 datasets representing 5 types of tasks:

  • Fact-checking (FEVER),
  • Entity linking (AIDA CoNLL-YAGO, WNED-WIKI, WNED-CWEB),
  • Slot filling (T-Rex, Zero Shot RE),
  • Open domain QA (Natural Questions, HotpotQA, TriviaQA, ELI5),
  • Dialog generation (Wizard of Wikipedia).

All these datasets have been grounded in a single pre-processed wikipedia snapshot, allowing for fairer and more consistent evaluation as well as enabling new task setups such as multitask and transfer learning.

Source: KILT Benchmarking


Paper Code Results Date Stars

Dataset Loaders


Similar Datasets


  • Unknown