SV-Ident comprises 4,248 sentences from social science publications in English and German. The data is the official data for the Shared Task: “Survey Variable Identification in Social Science Publications” (SV-Ident) 2022. Sentences are labeled with variables that are mentioned either explicitly or implicitly.
The dataset supports the following tasks:
Variable Detection: identifying whether a sentence contains a variable mention or not.
Variable Disambiguation: identifying which variable from a given vocabulary is mentioned in a sentence.