FEVER is a publicly available dataset for fact extraction and verification against textual sources.
478 PAPERS • 4 BENCHMARKS
KILT (Knowledge Intensive Language Tasks) is a benchmark consisting of 11 datasets representing 5 types of tasks:
114 PAPERS • 12 BENCHMARKS
FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.
42 PAPERS • NO BENCHMARKS YET
The VitaminC dataset contains more than 450,000 claim-evidence pairs for fact verification and factual consistent generation. Based on over 100,000 revisions to popular Wikipedia pages, and additional "synthetic" revisions.
41 PAPERS • NO BENCHMARKS YET
A testbed for commonsense reasoning about entity knowledge, bridging fact-checking about entities with commonsense inferences.
29 PAPERS • NO BENCHMARKS YET
Is a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes.
Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from snopes.com.
22 PAPERS • 1 BENCHMARK
Fact-checking (FC) articles which contains pairs (multimodal tweet and a FC-article) from politifact.com.
20 PAPERS • 1 BENCHMARK
AVeriTeC (Automated Verification of Textual Claims) is a dataset of 4568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. The Claims in AVeriTeC are classified into four labels: "Supported", "Refuted", "Not Enough Evidence", "Conflicting Evidence/Cherry-picking". The dataset also contains several fields of metadata such as the speaker of the claim, the publisher of the claim, the date the claim was published, and the location most relevant to the claim. These can be used to support questions, answers, and justifications.
16 PAPERS • 1 BENCHMARK
FaVIQ (Fact Verification from Information-seeking Questions) is a challenging and realistic fact verification dataset that reflects confusions raised by real users. We use the ambiguity in information-seeking questions and their disambiguation, and automatically convert them to true and false claims. These claims are natural, and require a complete understanding of the evidence for verification. FaVIQ serves as a challenging benchmark for natural language understanding, and improves performance in professional fact checking.
15 PAPERS • NO BENCHMARKS YET
X-FACT is a large publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers. The dataset includes a multilingual evaluation benchmark that measures both out-of-domain generalization, and zero-shot capabilities of the multilingual models.
13 PAPERS • 1 BENCHMARK
A large-scale dataset that consists of 21,184 claims, where each claim is assigned a truthfulness label and ruling statement, with 58,523 pieces of evidence in the form of text and images. It supports the end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (i.e., support, refute and not enough information), and generate a rationalization statement to explain the reasoning and ruling process.
10 PAPERS • NO BENCHMARKS YET
FACTIFY is a dataset on multi-modal fact verification. It contains images, textual claim, reference textual documenta and image. The task is to classify the claims into support, not-enough-evidence and refute categories with the help of the supporting data. We aim to combat fake news in the social media era by providing this multi-modal dataset. Factify contains 50,000 claims accompanied with 100,000 images, split into training, validation and test sets.
4 PAPERS • NO BENCHMARKS YET
We present a dataset, DANFEVER, intended for claim verification in Danish. The dataset builds upon the task framing of the FEVER fact extraction and verification challenge. DANFEVER can be used for creating models for detecting mis- & disinformation in Danish as well as for verification in multilingual settings.
3 PAPERS • 1 BENCHMARK
The LIAR dataset has been widely followed by fake news detection researchers since its release, and along with a great deal of research, the community has provided a variety of feedback on the dataset to improve it. We adopted these feedbacks and released the LIAR2 dataset, a new benchmark dataset of ~23k manually labeled by professional fact-checkers for fake news detection tasks. We have used a split ratio of 8:1:1 to distinguish between the training set, the test set, and the validation set, details of which are provided in the paper of "An Enhanced Fake News Detection System With Fuzzy Deep Learning". The LIAR2 dataset can be accessed at Huggingface and Github, and statistical information for LIAR and LIAR2 is provided in the table below:
CFEVER is a Chinese Fact Extraction and VERification dataset published at AAAI 2024. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Inspired by the FEVER dataset (Thorne et al., 2018), we provide class labels (Supports, Refutes, or Not Enough Information) and evidence for each claim in the CFEVER dataset. Download the dataset: https://github.com/IKMLab/CFEVER-data
1 PAPER • NO BENCHMARKS YET
Intermediate annotations from the FEVER dataset that describe original facts extracted from Wikipedia and the mutations that were applied, yielding the claims in FEVER.