VALUE (Video-And-Language Understanding Evaluation)

Introduced by Li et al. in VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

VALUE is a Video-And-Language Understanding Evaluation benchmark to test models that are generalizable to diverse tasks, domains, and datasets. It is an assemblage of 11 VidL (video-and-language) datasets over 3 popular tasks: (i) text-to-video retrieval; (ii) video question answering; and (iii) video captioning. VALUE benchmark aims to cover a broad range of video genres, video lengths, data volumes, and task difficulty levels. Rather than focusing on single-channel videos with visual information only, VALUE promotes models that leverage information from both video frames and their associated subtitles, as well as models that share knowledge across multiple tasks.

The datasets used for the VALUE benchmark are: TVQA, TVR, TVC, How2R, How2QA, VIOLIN, VLEP, YouCook2 (YC2C, YC2R), VATEX


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Multiple licenses