Datasets > Modality > Texts > SICK (Sentences Involving Compositional Knowledge)

SICK (Sentences Involving Compositional Knowledge)

Introduced by Marelli et al. in A SICK cure for the evaluation of compositional distributional semantic models

The Sentences Involving Compositional Knowledge (SICK) dataset is a dataset for compositional distributional semantics. It includes a large number of sentence pairs that are rich in the lexical, syntactic and semantic phenomena. Each pair of sentences is annotated in two dimensions: relatedness and entailment. The relatedness score ranges from 1 to 5, and Pearson’s r is used for evaluation; the entailment relation is categorical, consisting of entailment, contradiction, and neutral. There are 4439 pairs in the train split, 495 in the trial split used for development and 4906 in the test split. The sentence pairs are generated from image and video caption datasets before being paired up using some algorithm.

Source: Multi-Label Transfer Learning for Multi-Relational Semantic Similarity