Machine Translation
2153 papers with code • 80 benchmarks • 77 datasets
Machine translation is the task of translating a sentence in a source language to a different target language.
Approaches for machine translation can range from rule-based to statistical to neural-based. More recently, encoder-decoder attention-based architectures like BERT have attained major improvements in machine translation.
One of the most popular datasets used to benchmark machine translation systems is the WMT family of datasets. Some of the most commonly used evaluation metrics for machine translation systems include BLEU, METEOR, NIST, and others.
( Image credit: Google seq2seq )
Libraries
Use these libraries to find Machine Translation models and implementationsSubtasks
Latest papers with no code
Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families.
Guylingo: The Republic of Guyana Creole Corpora
While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support.
Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English
People communicate in more than 7, 000 languages around the world, with around 780 languages spoken in India alone.
Relay Decoding: Concatenating Large Language Models for Machine Translation
Leveraging large language models for machine translation has demonstrated promising results.
The Call for Socially Aware Language Technologies
While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users.
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT).
The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment
The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study.
Efficient Sample-Specific Encoder Perturbations
Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks.
Suvach -- Generated Hindi QA benchmark
Current evaluation benchmarks for question answering (QA) in Indic languages often rely on machine translation of existing English datasets.
Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages
Although it has mainly been a spoken language until recently, there are currently two written genres (BBC and Wikipedia) in Naija.