RRS (Restoration-200k for Response Selection)

Introduced by Lan et al. in Exploring Dense Retrieval for Dialogue Response Selection

	Train	Validation	Test	Ranking Test
size	0.4M	50K	5K	800
pos:neg	1:1	1:9	1.2:8.8	-
avg turns	5.0	5.0	5.0	5.0

Ranking test set contains the high-quality responses that selected by some baselines, and their correlation with the conversation context are carefully annotated by 8 professional annotators (the average annotation scores are saved for ranking). For ranking test set, the metrics should be NDCG@3 and NDCG@5, since the correlation scores are provided. More details are available in the Appendix of the paper.

Homepage