|
Train |
Validation |
Test |
Ranking Test |
size |
0.4M |
50K |
5K |
800 |
pos:neg |
1:1 |
1:9 |
1.2:8.8 |
- |
avg turns |
5.0 |
5.0 |
5.0 |
5.0 |
Ranking test set contains the high-quality responses that selected by some baselines, and their correlation with the conversation context are carefully annotated by 8 professional annotators (the average annotation scores are saved for ranking). For ranking test set, the metrics should be NDCG@3 and NDCG@5, since the correlation scores are provided. More details are available in the Appendix of the paper.