Machine Translation

2153 papers with code • 80 benchmarks • 77 datasets

Machine translation is the task of translating a sentence in a source language to a different target language.

Approaches for machine translation can range from rule-based to statistical to neural-based. More recently, encoder-decoder attention-based architectures like BERT have attained major improvements in machine translation.

One of the most popular datasets used to benchmark machine translation systems is the WMT family of datasets. Some of the most commonly used evaluation metrics for machine translation systems include BLEU, METEOR, NIST, and others.

( Image credit: Google seq2seq )

Libraries

Use these libraries to find Machine Translation models and implementations
24 papers
1,207
15 papers
29,334
14 papers
125,725
See all 14 libraries.

Latest papers with no code

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

no code yet • 28 Apr 2024

We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis.

Usefulness of Emotional Prosody in Neural Machine Translation

no code yet • 27 Apr 2024

In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice.

Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation

no code yet • 27 Apr 2024

Providing quality scores along with Machine Translation (MT) output, so-called reference-free Quality Estimation (QE), is crucial to inform users about the reliability of the translation.

I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures

no code yet • 27 Apr 2024

Modularity is a paradigm of machine translation with the potential of bringing forth models that are large at training time and small during inference.

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

no code yet • 27 Apr 2024

Due to their infrequent appearance in the text corpus, Scaffold Tokens pose a learning imbalance issue for language models.

TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya

no code yet • 26 Apr 2024

The absence of explicitly tailored, accessible annotated datasets for educational purposes presents a notable obstacle for NLP tasks in languages with limited resources. This study initially explores the feasibility of using machine translation (MT) to convert an existing dataset into a Tigrinya dataset in SQuAD format.

Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model

no code yet • 25 Apr 2024

While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial".

Translation of Multifaceted Data without Re-Training of Machine Translation Systems

no code yet • 25 Apr 2024

In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation.

Neural Proto-Language Reconstruction

no code yet • 24 Apr 2024

Proto-form reconstruction has been a painstaking process for linguists.

Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers

no code yet • 23 Apr 2024

The task of accurate and efficient language translation is an extremely important information processing task.