MultiSubs (MultiSubs: A Large-scale Multimodal and Multilingual Dataset)

Introduced by Wang et al. in MultiSubs: A Large-scale Multimodal and Multilingual Dataset

MultiSubs is a dataset of multilingual subtitles gathered from the OPUS OpenSubtitles dataset, which in turn was sourced from opensubtitles.org. We have supplemented some text fragments (visually salient nouns in this release) within the subtitles with web images, where the word sense of the fragment has been disambiguated using a cross-lingual approach. We have introduced a fill-in-the-blank task and a lexical translation task to demonstrate the utility of the dataset. Please refer to our paper for a more detailed description of the dataset and tasks. Multisubs will benefit research on visual grounding of words especially in the context of free-form sentence.

Josiah Wang, Pranava Madhyastha, Josiel Figueiredo, Chiraag Lala, Lucia Specia (2021). MultiSubs: A Large-scale Multimodal and Multilingual Dataset. CoRR, abs/2103.01910. Available at: [https://arxiv.org/abs/2103.01910] (https://arxiv.org/abs/2103.01910)

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Multimodal Text Prediction	MultiSubs	9-gram LM with back-off
Multimodal Lexical Translation	MultiSubs English-Spanish	Multimodal BRNN
Multimodal Lexical Translation	MultiSubs English-Portuguese	Multimodal BRNN
Multimodal Lexical Translation	MultiSubs English-French	Multimodal BRNN
Multimodal Lexical Translation	MultiSubs English-German	Multimodal BRNN

Papers

Paper	Code	Results	Date	Stars

MultiSubs (MultiSubs: A Large-scale Multimodal and Multilingual Dataset)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

PhotoBook

CLEVR-Dialog

ImageCoDe

Iconary

Usage

License

Modalities

Languages

MultiSubs (MultiSubs: A Large-scale Multimodal and Multilingual Dataset)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

PhotoBook

CLEVR-Dialog

ImageCoDe

Iconary

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages