SODA is a high-quality social dialogue dataset. In contrast to most existing crowdsourced, small-scale dialogue corpora, Soda distills 1.5M socially-grounded dialogues from a pre-trained language model (InstructGPT; Ouyang et al., ). Dialogues are distilled by contextualizing social commonsense knowledge from a knowledge graph (Atomic10x).
Source: SODA: Million-scale Dialogue Distillation with Social Commonsense ContextualizationPaper | Code | Results | Date | Stars |
---|