Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus

1 Jan 2017  ·  Ryan Lowe, Nissan Pow, Iulian Vlad Serban, Laurent Charlin, Chia-Wei Liu, Joelle Pineau ·

In this paper, we analyze neural network-based dialogue systems trained in an end-to-end manner using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering. We provide baselines in two different environments: one where models are trained to select the correct next response from a list of candidate responses, and one where models are trained to maximize the loglikelihood of a generated utterance conditioned on the context of the conversation. These are both evaluated on a recall task that we call next utterance classification (NUC), and using vector-based metrics that capture the topicality of the responses. We observe that current end-to-end models are unable to completely solve these tasks; thus, we provide a qualitative error analysis to determine the primary causes of error for end-to-end models evaluated on NUC, and examine sample utterances from the generative models. As a result of this analysis, we suggest some promising directions for future research on the Ubuntu Dialogue Corpus, which can also be applied to end-to-end dialogue systems in general.

PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Conversation Disentanglement Linux IRC (Ch2 Elsner) Heuristic 1-1 45.1 # 3
Local 73.8 # 3
Shen F-1 51.8 # 3
Conversation Disentanglement Linux IRC (Ch2 Kummerfeld) Heuristic 1-1 43.4 # 3
Local 67.9 # 2
Shen F-1 50.7 # 2

Methods


No methods listed for this paper. Add relevant methods here