The Dataset is part of the KELM corpus

This is the Wikipedia text--Wikidata KG aligned corpus used to train the data-to-text generation model. Please note that this is a corpus generated with distant supervision and should not be used as gold standard for evaluation.

It consists of 3 files:

Each file contains one example per line. Each example is a json object with three fields:

triples: A list of triples of the form (subject, relation, object). eg. (Person X, award received, Award Y). If the triple has a subproperty, then it is quadruple instead. eg. (Person X, Award Y, received on, Date Z).

serialized triples: triples concatenated together as used for input to T5. The format is "<subject> <relation> <object>" where some subjects have multiple relations, e.g. "<subject> <relation1> <object1> <relation2> <object2> <relation3> <object3>". For more details on how these relations are grouped, please refer to the paper.

sentence: The wikipedia sentence aligned to these triples.

The names, aliases and Wikidata Ids of the entities can be found in


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets