Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Machine Translation

Dataset	Best Model	Compare
Multi30K	ERNIE-UniX2	See all
Hindi Visual Genome (Test Set)	ViTA	See all
Hindi Visual Genome (Challenge Set)	ViTA	See all

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

facebookresearch/seamless_communica…

2 papers

10,196

lium-lst/nmtpy

2 papers

126

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation

no code yet • 19 Oct 2022

To this end, we first propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.

Paper
Add Code

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

no code yet • 16 Oct 2022

Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image.

Paper
Add Code

Supervised Visual Attention for Simultaneous Multimodal Machine Translation

no code yet • 23 Jan 2022

A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation.

Paper
Add Code

On Vision Features in Multimodal Machine Translation

no code yet • ACL ARR November 2021

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.

Paper
Add Code

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

no code yet • ACL 2021

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Paper
Add Code

Gumbel-Attention for Multi-modal Machine Translation

no code yet • 16 Mar 2021

Multi-modal machine translation (MMT) improves translation quality by introducing visual information.

Paper
Add Code

Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation

no code yet • 1 Jan 2021

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Paper
Add Code

Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding

no code yet • 18 Dec 2020

In this paper, we propose an object-level visual context modeling framework (OVC) to efficiently capture and explore visual information for multimodal machine translation.

Paper
Add Code

MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

no code yet • 13 Dec 2020

We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.

Paper
Add Code

Generative Imagination Elevates Machine Translation

no code yet • NAACL 2021

Given a sentence in a source language, whether depicting the visual scene helps translation into a target language?

Paper
Add Code

Multimodal Machine Translation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result