About

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Datasets

Greatest papers with code

NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

1 Jun 2017lium-lst/nmtpy

nmtpy has been used for LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.

MULTIMODAL MACHINE TRANSLATION

Does Multimodality Help Human and Machine for Translation and Image Captioning?

WS 2016 lium-lst/nmtpy

This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge.

IMAGE CAPTIONING MULTIMODAL MACHINE TRANSLATION

Findings of the Third Shared Task on Multimodal Machine Translation

WS 2018 multi30k/dataset

In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.

MULTIMODAL MACHINE TRANSLATION

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

4 Sep 2020DeepLearnXMU/MM-DCCN

Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.

MULTIMODAL MACHINE TRANSLATION REPRESENTATION LEARNING

Multi30K: Multilingual English-German Image Descriptions

WS 2016 lium-lst/wmt17-mmt

We introduce the Multi30K dataset to stimulate multilingual multimodal research.

MULTIMODAL MACHINE TRANSLATION

Latent Variable Model for Multi-modal Translation

ACL 2019 iacercalixto/variational_mmt

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.

MULTIMODAL MACHINE TRANSLATION MULTI-TASK LEARNING

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

EMNLP 2018 Eurus-Holmes/VAG-NMT

The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.

MULTIMODAL MACHINE TRANSLATION

Distilling Translations with Visual Awareness

ACL 2019 ImperialNLP/MMT-Delib

Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient.

 Ranked #1 on Multimodal Machine Translation on Multi30K (BLEU (EN-FR) metric)

MULTIMODAL MACHINE TRANSLATION

UMONS Submission for WMT18 Multimodal Translation Task

15 Oct 2018jbdel/WMT18_MNMT

This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18).

IMAGE CAPTIONING MULTIMODAL MACHINE TRANSLATION

Multimodal Machine Translation with Embedding Prediction

NAACL 2019 toshohirasawa/nmtpytorch-emb-pred

Multimodal machine translation is an attractive application of neural machine translation (NMT).

MULTIMODAL MACHINE TRANSLATION WORD EMBEDDINGS