SK-VG is a dataset for Scene Knowledge-guided Visual Grounding, where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. To perform this task, SK-VG is the first dataset of the fourth type, where for each image, we provide human-written knowledge to describe its content.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages