The FewRel (Few-Shot Relation Classification Dataset) contains 100 relations and 70,000 instances from Wikipedia. The dataset is divided into three subsets: training set (64 relations), validation set (16 relations) and test set (20 relations).
178 PAPERS • 4 BENCHMARKS
DocRED (Document-Level Relation Extraction Dataset) is a relation extraction dataset constructed from Wikipedia and Wikidata. Each document in the dataset is human-annotated with named entity mentions, coreference information, intra- and inter-sentence relations, and supporting evidence. DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document. Along with the human-annotated data, the dataset provides large-scale distantly supervised data.
147 PAPERS • 4 BENCHMARKS
SciERC dataset is a collection of 500 scientific abstract annotated with scientific entities, their relations, and coreference clusters. The abstracts are taken from 12 AI conference/workshop proceedings in four AI communities, from the Semantic Scholar Corpus. SciERC extends previous datasets in scientific articles SemEval 2017 Task 10 and SemEval 2018 Task 7 by extending entity types, relation types, relation coverage, and adding cross-sentence relations using coreference links.
126 PAPERS • 7 BENCHMARKS
A more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances? (2) Can they detect none-of-the-above (NOTA) relations?
38 PAPERS • NO BENCHMARKS YET
FREDo is a Few-Shot Document-Level Relation Extraction Benchmark based on DocRED and SciERC. The dataset is divided into four subsets: training set (62 relations), validation set (16 relations), in-domain test set (16 relations), and cross-domain test set (7 relations).
2 PAPERS • 2 BENCHMARKS
Green family of datasets for emergent communications on relations.
1 PAPER • NO BENCHMARKS YET