4 dataset results for Visual Grounding AND Images

The RefCOCO dataset is a referring expression generation (REG) dataset used for tasks related to understanding natural language expressions that refer to specific objects in images. Here are the key details about RefCOCO:

301 PAPERS • 19 BENCHMARKS

DIOR-RSVG

DIOR-RSVG is a large-scale benchmark dataset of remote sensing data (RSVG). It aims to localize the referred objects in remote sensing (RS) images with the guidance of natural language. This new dataset includes image/expression/box triplets for training and evaluating visual grounding models.

7 PAPERS • NO BENCHMARKS YET

A Game Of Sorts

A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.

2 PAPERS • NO BENCHMARKS YET

SK-VG

SK-VG is a dataset for Scene Knowledge-guided Visual Grounding, where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. To perform this task, SK-VG is the first dataset of the fourth type, where for each image, we provide human-written knowledge to describe its content.

2 PAPERS • NO BENCHMARKS YET

Datasets

4 dataset results for Visual Grounding AND Images