Multimodal Machine Translation
28 papers with code • 3 benchmarks • 4 datasets
Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.
( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )
These leaderboards are used to track progress in Multimodal Machine Translation
LibrariesUse these libraries to find Multimodal Machine Translation models and implementations
Most implemented papers
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.
Multi30K: Multilingual English-German Image Descriptions
We introduce the Multi30K dataset to stimulate multilingual multimodal research.
Does Multimodality Help Human and Machine for Translation and Image Captioning?
This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge.
NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems
nmtpy has been used for LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.
Findings of the Third Shared Task on Multimodal Machine Translation
In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.
UMONS Submission for WMT18 Multimodal Translation Task
This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18).
Latent Variable Model for Multi-modal Translation
In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.
Multimodal Machine Translation with Embedding Prediction
Multimodal machine translation is an attractive application of neural machine translation (NMT).