Texts

SV-Ident (Survey Variable Identification)

Introduced by Tsereteli et al. in Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications

SV-Ident comprises 4,248 sentences from social science publications in English and German. The data is the official data for the Shared Task: “Survey Variable Identification in Social Science Publications” (SV-Ident) 2022. Sentences are labeled with variables that are mentioned either explicitly or implicitly.

The dataset supports the following tasks:

Variable Detection: identifying whether a sentence contains a variable mention or not.
Variable Disambiguation: identifying which variable from a given vocabulary is mentioned in a sentence.

Homepage