Multimodal Machine Translation

28 papers with code • 3 benchmarks • 4 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations
2 papers
126

Most implemented papers

Attention Is All You Need

tensorflow/tensor2tensor NeurIPS 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

Multi30K: Multilingual English-German Image Descriptions

lium-lst/wmt17-mmt WS 2016

We introduce the Multi30K dataset to stimulate multilingual multimodal research.

Does Multimodality Help Human and Machine for Translation and Image Captioning?

lium-lst/nmtpy WS 2016

This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge.

NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

lium-lst/nmtpy 1 Jun 2017

nmtpy has been used for LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Eurus-Holmes/VAG-NMT EMNLP 2018

The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.

Findings of the Third Shared Task on Multimodal Machine Translation

multi30k/dataset WS 2018

In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.

UMONS Submission for WMT18 Multimodal Translation Task

jbdel/WMT18_MNMT 15 Oct 2018

This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18).

Latent Variable Model for Multi-modal Translation

iacercalixto/variational_mmt ACL 2019

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.

Multimodal Machine Translation with Embedding Prediction

toshohirasawa/nmtpytorch-emb-pred NAACL 2019

Multimodal machine translation is an attractive application of neural machine translation (NMT).