Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.

PDF Abstract WS 2019 PDF WS 2019 Abstract

Datasets


Introduced in the Paper:

Reddit Corpus

Used in the Paper:

Reddit OpenSubtitles
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Conversational Response Selection PolyAI AmazonQA PolyAI Encoder 1-of-100 Accuracy 71.3% # 2
Conversational Response Selection PolyAI OpenSubtitles PolyAI Encoder 1-of-100 Accuracy 30.6% # 1
Conversational Response Selection PolyAI Reddit PolyAI Encoder 1-of-100 Accuracy 61.3% # 3

Methods


No methods listed for this paper. Add relevant methods here