Multimodal Machine Translation
35 papers with code • 3 benchmarks • 5 datasets
Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.
( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )
Libraries
Use these libraries to find Multimodal Machine Translation models and implementationsLatest papers with no code
A Visually-Grounded Parallel Corpus with Phrase-to-Region Linking
To verify our dataset, we performed phrase localization experiments in both languages and investigated the effectiveness of our Japanese annotations as well as multilingual learning realized by our dataset.
Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach
We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation.
Multimodal Machine Translation through Visuals and Speech
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.
Adaptive Fusion Techniques for Multimodal Data
Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data.
Understanding the Effect of Textual Adversaries in Multimodal Machine Translation
It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image.
Transformer-based Cascaded Multimodal Speech Translation
Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.
On Leveraging the Visual Modality for Neural Machine Translation
Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics.
Probing Representations Learned by Multimodal Recurrent and Transformer Models
In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation.
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
We present ``Hindi Visual Genome'', a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research.