3 dataset results for Temporal Localization

Charades-STA

Charades-STA is a new dataset built on top of Charades by adding sentence temporal annotations.

184 PAPERS • 4 BENCHMARKS

AVE

AVE (Audio-Visual Event Localization)

To investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization.

84 PAPERS • NO BENCHMARKS YET

VidSTG

The VidSTG dataset is a spatio-temporal video grounding dataset constructed based on the video relation dataset VidOR. VidOR contains 7,000, 835 and 2,165 videos for training, validation and testing, respectively. The goal of the Spatio-Temporal Video Grounding task (STVG) is to localize the spatio-temporal section of an untrimmed video that matches a given sentence depicting an object.

20 PAPERS • 1 BENCHMARK

Datasets

3 dataset results for Temporal Localization