Datasets > Modality > Texts > VATEX (Video And TEXt)

VATEX is multilingual, large, linguistically complex, and diverse dataset in terms of both video and natural language descriptions. It has two tasks for video-and-language research: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

Source: https://arxiv.org/pdf/1904.03493.pdf

Samples

License

Modalities

Languages

Tasks