Machine Translation

2153 papers with code • 80 benchmarks • 77 datasets

Machine translation is the task of translating a sentence in a source language to a different target language.

Approaches for machine translation can range from rule-based to statistical to neural-based. More recently, encoder-decoder attention-based architectures like BERT have attained major improvements in machine translation.

One of the most popular datasets used to benchmark machine translation systems is the WMT family of datasets. Some of the most commonly used evaluation metrics for machine translation systems include BLEU, METEOR, NIST, and others.

( Image credit: Google seq2seq )

Libraries

Use these libraries to find Machine Translation models and implementations
24 papers
1,207
15 papers
29,334
14 papers
125,725
See all 14 libraries.

Latest papers with no code

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

no code yet • 8 May 2024

We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families.

Guylingo: The Republic of Guyana Creole Corpora

no code yet • 6 May 2024

While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support.

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English

no code yet • 5 May 2024

People communicate in more than 7, 000 languages around the world, with around 780 languages spoken in India alone.

Relay Decoding: Concatenating Large Language Models for Machine Translation

no code yet • 5 May 2024

Leveraging large language models for machine translation has demonstrated promising results.

The Call for Socially Aware Language Technologies

no code yet • 3 May 2024

While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users.

Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation

no code yet • 2 May 2024

Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT).

The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment

no code yet • 2 May 2024

The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study.

Efficient Sample-Specific Encoder Perturbations

no code yet • 1 May 2024

Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks.

Suvach -- Generated Hindi QA benchmark

no code yet • 30 Apr 2024

Current evaluation benchmarks for question answering (QA) in Indic languages often rely on machine translation of existing English datasets.

Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages

no code yet • 30 Apr 2024

Although it has mainly been a spoken language until recently, there are currently two written genres (BBC and Wikipedia) in Naija.