Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.
( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )
nmtpy has been used for LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.
This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge.
In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.
Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.
We introduce the Multi30K dataset to stimulate multilingual multimodal research.
In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.
Ranked #4 on
Multimodal Machine Translation
on Multi30K
The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.
Ranked #7 on
Multimodal Machine Translation
on Multi30K
Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient.
Ranked #1 on
Multimodal Machine Translation
on Multi30K
(BLEU (EN-FR) metric)
This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18).
Multimodal machine translation is an attractive application of neural machine translation (NMT).