RRS (Restoration-200k for Response Selection)

Introduced by Lan et al. in Exploring Dense Retrieval for Dialogue Response Selection
Train Validation Test Ranking Test
size 0.4M 50K 5K 800
pos:neg 1:1 1:9 1.2:8.8 -
avg turns 5.0 5.0 5.0 5.0

Ranking test set contains the high-quality responses that selected by some baselines, and their correlation with the conversation context are carefully annotated by 8 professional annotators (the average annotation scores are saved for ranking). For ranking test set, the metrics should be NDCG@3 and NDCG@5, since the correlation scores are provided. More details are available in the Appendix of the paper.


