Empirical study of pretrained multilingual language models for zero-shot cross-lingual generation

15 Oct 2023  ·  Nadezhda Chirkova, Sheng Liang, Vassilina Nikoulina ·

Zero-shot cross-lingual generation assumes finetuning the multilingual pretrained language model (mPLM) on a generation task in one language and then using it to make predictions for this task in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, and compare various approaches proposed in the literature in a unified setting. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline; other competitive approaches include parameter-efficient tuning with adapters and training on several source languages. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases.

PDF Abstract


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.