Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

Latest papers with no code

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation

no code yet • 19 Oct 2022

To this end, we first propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

no code yet • 16 Oct 2022

Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image.

Supervised Visual Attention for Simultaneous Multimodal Machine Translation

no code yet • 23 Jan 2022

A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation.

On Vision Features in Multimodal Machine Translation

no code yet • ACL ARR November 2021

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

no code yet • ACL 2021

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Gumbel-Attention for Multi-modal Machine Translation

no code yet • 16 Mar 2021

Multi-modal machine translation (MMT) improves translation quality by introducing visual information.

Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation

no code yet • 1 Jan 2021

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding

no code yet • 18 Dec 2020

In this paper, we propose an object-level visual context modeling framework (OVC) to efficiently capture and explore visual information for multimodal machine translation.

MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

no code yet • 13 Dec 2020

We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.

Generative Imagination Elevates Machine Translation

no code yet • NAACL 2021

Given a sentence in a source language, whether depicting the visual scene helps translation into a target language?