VATEX (Video And TEXt)

Introduced by Wang et al. in VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

VATEX is multilingual, large, linguistically complex, and diverse dataset in terms of both video and natural language descriptions. It has two tasks for video-and-language research: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

Source: https://arxiv.org/pdf/1904.03493.pdf

Homepage