Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation?

ComputEL (ACL) 2022 · Santiago Góngora, Nicolás Giossa, Luis Chiruzzo ·

Machine translation for low-resource languages, such as Guarani, is a challenging task due to the lack of data. One way of tackling it is using pretrained word embeddings for model initialization. In this work we try to check if currently available data is enough to train rich embeddings for enhancing MT for Guarani and Spanish, by building a set of word embedding collections and training MT systems using them. We found that the trained vectors are strong enough to slightly improve the performance of some of the translation models and also to speed up the training convergence.

PDF Abstract