Neural Machine Translation by Jointly Learning to Align and Translate

1 Sep 2014  ·  Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio ·

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

PDF Abstract

Results from the Paper


Ranked #4 on Dialogue Generation on Persona-Chat (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Bangla Spelling Error Correction DPCSpell-Bangla-SEC-Corpus GRUSeq2Seq Exact Match Accuracy 75.56% # 4
Machine Translation IWSLT2015 German-English Bi-GRU (MLE+SLE) BLEU score 28.53 # 11
Dialogue Generation Persona-Chat Seq2Seq + Attention Avg F1 16.18 # 4
Machine Translation WMT2014 English-French RNN-search50* BLEU score 36.2 # 43

Methods