FEVER is a publicly available dataset for fact extraction and verification against textual sources.
276 PAPERS • 3 BENCHMARKS
KILT (Knowledge Intensive Language Tasks) is a benchmark consisting of 11 datasets representing 5 types of tasks:
62 PAPERS • 11 BENCHMARKS
Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from snopes.com.
18 PAPERS • NO BENCHMARKS YET
The VitaminC dataset contains more than 450,000 claim-evidence pairs for fact verification and factual consistent generation. Based on over 100,000 revisions to popular Wikipedia pages, and additional "synthetic" revisions.
17 PAPERS • NO BENCHMARKS YET
FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.
16 PAPERS • NO BENCHMARKS YET
Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from politifact.com.
15 PAPERS • 1 BENCHMARK
Is a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes.
7 PAPERS • NO BENCHMARKS YET
FaVIQ (Fact Verification from Information-seeking Questions) is a challenging and realistic fact verification dataset that reflects confusions raised by real users. We use the ambiguity in information-seeking questions and their disambiguation, and automatically convert them to true and false claims. These claims are natural, and require a complete understanding of the evidence for verification. FaVIQ serves as a challenging benchmark for natural language understanding, and improves performance in professional fact checking.
6 PAPERS • NO BENCHMARKS YET
X-FACT is a large publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers. The dataset includes a multilingual evaluation benchmark that measures both out-of-domain generalization, and zero-shot capabilities of the multilingual models.
6 PAPERS • 1 BENCHMARK
A testbed for commonsense reasoning about entity knowledge, bridging fact-checking about entities with commonsense inferences.
5 PAPERS • NO BENCHMARKS YET
FACTIFY is a dataset on multi-modal fact verification. It contains images, textual claim, reference textual documenta and image. The task is to classify the claims into support, not-enough-evidence and refute categories with the help of the supporting data. We aim to combat fake news in the social media era by providing this multi-modal dataset. Factify contains 50,000 claims accompanied with 100,000 images, split into training, validation and test sets.
3 PAPERS • NO BENCHMARKS YET
We present a dataset, DANFEVER, intended for claim verification in Danish. The dataset builds upon the task framing of the FEVER fact extraction and verification challenge. DANFEVER can be used for creating models for detecting mis- & disinformation in Danish as well as for verification in multilingual settings.
2 PAPERS • 1 BENCHMARK
Intermediate annotations from the FEVER dataset that describe original facts extracted from Wikipedia and the mutations that were applied, yielding the claims in FEVER.
1 PAPER • NO BENCHMARKS YET