3 dataset results for Referring Expression AND Texts

SQA3D (Situated Question Answering in 3D Scenes)

SQA3D is a dataset for embodied scene understanding, where an agent needs to comprehend the scene it situates from an first person's perspective and answer questions. The questions are designed to be situated, embodied and knowledge-intensive. We offer three different modalities to represent a 3D scene: 3D scan, egocentric video and BEV picture.

16 PAPERS • 2 BENCHMARKS

A Game Of Sorts

A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.

2 PAPERS • NO BENCHMARKS YET

GRIT (General Robust Image Task Benchmark)

The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:

13 PAPERS • 8 BENCHMARKS

Datasets

3 dataset results for Referring Expression AND Texts