VidSTG

Introduced by Zhang et al. in Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences

The VidSTG dataset is a spatio-temporal video grounding dataset constructed based on the video relation dataset VidOR. VidOR contains 7,000, 835 and 2,165 videos for training, validation and testing, respectively. The goal of the Spatio-Temporal Video Grounding task (STVG) is to localize the spatio-temporal section of an untrimmed video that matches a given sentence depicting an object.

Source: https://github.com/Guaranteer/VidSTG-Dataset

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Spatio-Temporal Video Grounding	VidSTG	CG-STVG

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

Guaranteer/VidSTG-Dataset

Tasks

Spatio-Temporal Video Grounding
Temporal Localization

Similar Datasets

Video Localized Narratives

V-HICO

HC-STVG1

HC-STVG2

VidSTG

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Video Localized Narratives

V-HICO

HC-STVG1

HC-STVG2

Usage

License

Modalities

Languages

VidSTG

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Video Localized Narratives

V-HICO

HC-STVG1

HC-STVG2

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages