Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Machine Translation

Dataset	Best Model	Compare
Multi30K	ERNIE-UniX2	See all
Hindi Visual Genome (Test Set)	ViTA	See all
Hindi Visual Genome (Challenge Set)	ViTA	See all

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

facebookresearch/seamless_communica…

2 papers

10,196

lium-lst/nmtpy

2 papers

126

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Latent Variable Model for Multi-modal Translation

iacercalixto/variational_mmt • • ACL 2019

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.

Paper
Code

Multimodal Machine Translation with Embedding Prediction

toshohirasawa/nmtpytorch-emb-pred • • NAACL 2019

Multimodal machine translation is an attractive application of neural machine translation (NMT).

Paper
Code

Distilling Translations with Visual Awareness

ImperialNLP/MMT-Delib • • ACL 2019

Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient.

Paper
Code

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

microsoft/M3P • • CVPR 2021

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.

Paper
Code

Self-Knowledge Distillation with Progressive Refinement of Targets

lgcnsai/ps-kd-pytorch • • ICCV 2021

Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself.

Paper
Code

Multimodal Transformer for Multimodal Machine Translation

QAQ-v/MMT • • ACL 2020

Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality.

Paper
Code

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

DeepLearnXMU/MM-DCCN • • 4 Sep 2020

Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.

Paper
Code

Cross-lingual Visual Pre-training for Multimodal Machine Translation

imperialnlp/vtlm • • EACL 2021

Pre-trained language models have been shown to improve performance in many natural language tasks substantially.

Paper
Code

ViTA: Visual-Linguistic Translation by Aligning Object Tags

kshitij98/vita • Workshop on Asian Translation 2021

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation.

Paper
Code

Cultural and Geographical Influences on Image Translatability of Words across Languages

nikzadkhani/MMID-CNN-Analysis • NAACL 2021

We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i. e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity.

Paper
Code

Multimodal Machine Translation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result