Multimodal Machine Translation

35 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

Latest papers with no code

A Visually-Grounded Parallel Corpus with Phrase-to-Region Linking

no code yet • LREC 2020

To verify our dataset, we performed phrase localization experiments in both languages and investigated the effectiveness of our Japanese annotations as well as multilingual learning realized by our dataset.

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

no code yet • 8 Mar 2020

We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation.

Multimodal Machine Translation through Visuals and Speech

no code yet • 28 Nov 2019

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.

Adaptive Fusion Techniques for Multimodal Data

no code yet • EACL 2021

Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data.

Understanding the Effect of Textual Adversaries in Multimodal Machine Translation

no code yet • WS 2019

It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image.

Transformer-based Cascaded Multimodal Speech Translation

no code yet • EMNLP (IWSLT) 2019

Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.

On Leveraging the Visual Modality for Neural Machine Translation

no code yet • WS 2019

Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics.

Probing Representations Learned by Multimodal Recurrent and Transformer Models

no code yet • 29 Aug 2019

In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation.

Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation

no code yet • 21 Jul 2019

We present ``Hindi Visual Genome'', a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research.