Low Resource Chat Translation: A Benchmark for Hindi–English Language Pair

AMTA 2022 · Baban Gain, Ramakrishna Appicharla, Asif Ekbal, Muthusamy Chelliah, Soumya Chennabasavraj, Nikesh Garera ·

Chatbots are used in various sectors such as banking, healthcare, e-commerce, etc, and are mainly available in English. Machine Translation (MT) could be an effective way to develop multilingual chatbots. This paper provides a benchmark setup for Chat and QnA translation for English-Hindi, a relatively low-resource language pair. We create the English-Hindi parallel corpus consisting of both synthetic and gold standard parallel sentences from WMT20 Chat, MMD, and Flipkart QA corpora. We conduct experiments on Transfer Learning, Domain Adaptation techniques at sentence-level and context-level. We achieve BLEU scores of 57.6 and 55.5 on the En-Hi and Hi-En subsets of WMT20 Chat corpus, 51.5 and 80.4 BLEU scores on En-Hi and Hi-En subsets of MMD corpus, and 50.6 and 46.3 BLEU scores on question and answer subsets of QA corpus respectively. We also perform thorough quality tests by a very well-known e-commerce industry by deploying them to real users.

PDF Abstract