Consists of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context.
Source: Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale PretrainingPaper | Code | Results | Date | Stars |
---|