ActivityNet Captions

Introduced by Krishna et al. in Dense-Captioning Events in Videos

The ActivityNet Captions dataset is built on ActivityNet v1.3 which includes 20k YouTube untrimmed videos with 100k caption annotations. The videos are 120 seconds long on average. Most of the videos contain over 3 annotated events with corresponding start/end time and human-written sentences, which contain 13.5 words on average. The number of videos in train/validation/test split is 10024/4926/5044, respectively.

Source: Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Papers


Paper Code Results Date

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages