Browse > Natural Language Processing > Machine Translation > Multimodal Machine Translation

Multimodal Machine Translation

7 papers with code ยท Natural Language Processing
Subtask of Machine Translation

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers without code

A Visually-Grounded Parallel Corpus with Phrase-to-Region Linking

LREC 2020

To verify our dataset, we performed phrase localization experiments in both languages and investigated the effectiveness of our Japanese annotations as well as multilingual learning realized by our dataset.

IMAGE CAPTIONING MULTIMODAL MACHINE TRANSLATION

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

8 Mar 2020

We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation.

MULTIMODAL MACHINE TRANSLATION OPTICAL CHARACTER RECOGNITION

Multimodal Machine Translation through Visuals and Speech

28 Nov 2019

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.

IMAGE CAPTIONING MULTIMODAL MACHINE TRANSLATION SPEECH RECOGNITION VIDEO CAPTIONING

Understanding the Effect of Textual Adversaries in Multimodal Machine Translation

WS 2019

It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image.

MULTIMODAL MACHINE TRANSLATION

Transformer-based Cascaded Multimodal Speech Translation

29 Oct 2019

While the ASR component is identical across the experiments, the MMT model varies in terms of the way of integrating the visual context (simple conditioning vs. attention), the type of visual features exploited (pooled, convolutional, action categories) and the underlying architecture.

MULTIMODAL MACHINE TRANSLATION SPEECH RECOGNITION

On Leveraging the Visual Modality for Neural Machine Translation

WS 2019

Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics.

MULTIMODAL MACHINE TRANSLATION

On Leveraging the Visual Modality for Neural Machine Translation

WS 2019

Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics.

MULTIMODAL MACHINE TRANSLATION

Probing Representations Learned by Multimodal Recurrent and Transformer Models

29 Aug 2019

In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation.

IMAGE RETRIEVAL LANGUAGE MODELLING MULTIMODAL MACHINE TRANSLATION SEMANTIC TEXTUAL SIMILARITY