CANNOT (Compilation of ANnotated, Negation-Oriented Text-pairs)

Introduced by Anschütz et al. in This is not correct! Negation-aware Evaluation of Language Generation Systems

Dataset Summary

CANNOT is a dataset that focuses on negated textual pairs. It currently contains 77,376 samples, of which roughly of them are negated pairs of sentences, and the other half are not (they are paraphrased versions of each other).

The most frequent negation that appears in the dataset is verbal negation (e.g., will → won't), although it also contains pairs with antonyms (cold → hot).


Languages

CANNOT includes exclusively texts in English.


Dataset Structure

The dataset is given as a .tsv file with the following structure:

premise hypothesis label
A sentence. An equivalent, non-negated sentence (paraphrased). 0
A sentence. The sentence negated. 1

The dataset can be easily loaded into a Pandas DataFrame by running:

import pandas as pd

dataset = pd.read_csv('negation_dataset_v1.0.tsv', sep='\t')


Dataset Creation

The dataset has been created by cleaning up and merging the following datasets:

  1. Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation (see datasets/nan-nli).

  2. GLUE Diagnostic Dataset (see datasets/glue-diagnostic).

  3. Automated Fact-Checking of Claims from Wikipedia (see datasets/wikifactcheck-english).

  4. From Group to Individual Labels Using Deep Features (see datasets/sentiment-labelled-sentences). In this case, the negated sentences were obtained by using the Python module negate.

  5. It Is Not Easy To Detect Paraphrases: Analysing Semantic Similarity With Antonyms and Negation Using the New SemAntoNeg Benchmark (see datasets/antonym-substitution).

Once processed, the number of remaining samples in each of the datasets above are:

Dataset Samples
Not another Negation Benchmark 118
GLUE Diagnostic Dataset 154
Automated Fact-Checking of Claims from Wikipedia 14,970
From Group to Individual Labels Using Deep Features 2,110
It Is Not Easy To Detect Paraphrases 8,597
Total
25,949

Additionally, for each of the negated samples, another pair of non-negated sentences has been added by paraphrasing them with the pre-trained model 🤗tuner007/pegasus_paraphrase.

Finally, the swapped version of each pair (premise ⇋ hypothesis) has also been included, and any duplicates have been removed.

With this, the number of premises/hypothesis in the CANNOT dataset that appear in the original datasets are:

Dataset
Sentences
Not another Negation Benchmark 552     (0.36 %)
GLUE Diagnostic Dataset 586     (0.38 %)
Automated Fact-Checking of Claims from Wikipedia 89,728   (59.98 %)
From Group to Individual Labels Using Deep Features 12,626     (8.16 %)
It Is Not Easy To Detect Paraphrases 17,198   (11.11 %)
Total
120,690   (77.99 %)

The percentages above are in relation to the total number of premises and hypothesis in the CANNOT dataset. The remaining 22.01 % (34,062 sentences) are the novel premises/hypothesis added through paraphrase and rule-based negation.


Additional Information


Licensing Information

The CANNOT dataset is released under CC BY-SA 4.0.

Creative Commons License


Citation

Please cite our INLG 2023 paper, if you use our dataset. BibTeX:

@misc{anschütz2023correct,
      title={This is not correct! Negation-aware Evaluation of Language Generation Systems}, 
      author={Miriam Anschütz and Diego Miguel Lozano and Georg Groh},
      year={2023},
      eprint={2307.13989},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}


Contributions

Contributions to the dataset can be submitted through the project repository.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages