ASSIN

ASSIN (Avaliação de Similaridade Semântica e INferência textual) is a dataset with semantic similarity score and entailment annotations. It was used in a shared task in the PROPOR 2016 conference.

The full dataset has 10,000 sentence pairs, half of which in Brazilian Portuguese and half in European Portuguese. Either language variant has 2,500 pairs for training, 500 for validation and 2,000 for testing. This is different from the split used in the shared task, in which the training set had 3,000 pairs and there was no validation set. The shared task training set can be reconstructed by simply merging both sets.

Source: ASSIN

Homepage