MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

27 Aug 2022  ·  Qingyu Zhang, Xiaoyu Shen, Ernie Chang, Jidong Ge, Pengke Chen ·

Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning the multilingual, non-dialogue-focused pre-trained model mT5 as well as English-centric, dialogue-focused pre-trained chatbot DialoGPT. The results show that mT5-based models perform better on sacreBLEU and BertScore but worse on diversity. Even though promising results are found in few-shot and zero-shot scenarios, there is a large gap between the generation quality in English and other languages. We hope that the release of mDIA could encourage more works on multilingual dialogue generation to promote language diversity.

PDF Abstract

Datasets


Introduced in the Paper:

MDIA

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods