Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

Most implemented papers

Latent Variable Model for Multi-modal Translation

iacercalixto/variational_mmt ACL 2019

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.

Multimodal Machine Translation with Embedding Prediction

toshohirasawa/nmtpytorch-emb-pred NAACL 2019

Multimodal machine translation is an attractive application of neural machine translation (NMT).

Distilling Translations with Visual Awareness

ImperialNLP/MMT-Delib ACL 2019

Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient.

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

microsoft/M3P CVPR 2021

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.

Self-Knowledge Distillation with Progressive Refinement of Targets

lgcnsai/ps-kd-pytorch ICCV 2021

Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself.

Multimodal Transformer for Multimodal Machine Translation

QAQ-v/MMT ACL 2020

Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality.

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

DeepLearnXMU/MM-DCCN 4 Sep 2020

Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.

Cross-lingual Visual Pre-training for Multimodal Machine Translation

imperialnlp/vtlm EACL 2021

Pre-trained language models have been shown to improve performance in many natural language tasks substantially.

ViTA: Visual-Linguistic Translation by Aligning Object Tags

kshitij98/vita Workshop on Asian Translation 2021

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation.

Cultural and Geographical Influences on Image Translatability of Words across Languages

nikzadkhani/MMID-CNN-Analysis NAACL 2021

We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i. e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity.