Datasets > Modality > Texts > Multi30k

Multi30k

Introduced by Elliott et al. in Multi30K: Multilingual English-German Image Descriptions

Multi30K is a dataset to stimulate multilingual multimodal research for English-German. It is based on the Flickr30k dataset, which contains 31,014 images sourced from online photo-sharing websites. Each image is paired with five English descriptions, which were collected from Amazon Mechanical Turk. The dataset contains 145,000 training, 5,070 development, and 5,000 test descriptions. The Multi30K dataset extends the Flickr30K dataset with translated and independent German sentences.

Source: Multi30K: Multilingual English-German Image Descriptions

Samples

License

Modalities

Languages

Tasks

Similar Datasets