Advanced Language Model-based Translator for English-Vietnamese Translation

We introduce a transformative approach to English-Vietnamese translation, leveraging the cutting edge capabilities of the Gemma-7B-IT (Gemma Team et al. 2024) model. Enhanced by the Advanced Language Model-based Translator (ALMA) (Xu et al. 2023) methodology, our system significantly advances beyond the conventional Transformer models in handling complex linguistic contexts. This research details our robust training framework, experimental validations, and the rigorous evaluation processes that establish a new state-of-the-art for Vietnamese translation tasks. Our results emphatically surpass those of well-known systems such as VinAI Translate (Nguyen et al. 2022) and Google Translate (Google 2024b), demonstrating an improvement of over 12 BLEU scores against the previously top-performing systems. These achievements highlight the superior flexibility and contextual understanding capabilities of Large Language Models (LLMs) (Zhao et al. 2023) integrated within our ALMA framework, which excel in adapting to varied translation nuances and complexities. Capitalizing on these remarkable advancements, we have also introduced a user-centric translation product, available at https://www.doctranslate.io (Doctranslate 2023). This tool embodies our commitment to merging technological innovation with practical utility, offering users a seamless and high-quality translation experience.

PDF

Datasets


Results from the Paper


 Ranked #1 on Translation on PhoMT (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Translation PhoMT ALMA-Gemma-7B-IT-ST BLEU 56.21 # 1
Translation PhoMT ALMA-Gemma-7B-IT BLEU 52.70 # 2

Methods