Multimodal Machine Translation

35 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Machine Translation

Dataset	Best Model	Compare
Multi30K	ERNIE-UniX2	See all
Hindi Visual Genome (Test Set)	ViTA	See all
Hindi Visual Genome (Challenge Set)	ViTA	See all

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

facebookresearch/seamless_communica…

2 papers

10,250

lium-lst/nmtpy

2 papers

126

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

A Visually-Grounded Parallel Corpus with Phrase-to-Region Linking

no code yet • LREC 2020

To verify our dataset, we performed phrase localization experiments in both languages and investigated the effectiveness of our Japanese annotations as well as multilingual learning realized by our dataset.

Paper
Add Code

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

no code yet • 8 Mar 2020

We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation.

Paper
Add Code

Multimodal Machine Translation through Visuals and Speech

no code yet • 28 Nov 2019

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.

Paper
Add Code

Adaptive Fusion Techniques for Multimodal Data

no code yet • EACL 2021

Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data.

Paper
Add Code

Understanding the Effect of Textual Adversaries in Multimodal Machine Translation

no code yet • WS 2019

It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image.

Paper
Add Code

Transformer-based Cascaded Multimodal Speech Translation

no code yet • EMNLP (IWSLT) 2019

Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.

Paper
Add Code

On Leveraging the Visual Modality for Neural Machine Translation

no code yet • WS 2019

Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics.

Paper
Add Code

Probing Representations Learned by Multimodal Recurrent and Transformer Models

no code yet • 29 Aug 2019

In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation.

Paper
Add Code

Multilingual Multimodal Machine Translation for Dravidian Languages utilizing Phonetic Transcription

no code yet • WS 2019

Paper
Add Code

Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation

no code yet • 21 Jul 2019

We present ``Hindi Visual Genome'', a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research.

Paper
Add Code

Multimodal Machine Translation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result