SQA3D is a dataset for embodied scene understanding, where an agent needs to comprehend the scene it situates from an first person's perspective and answer questions. The questions are designed to be situated, embodied and knowledge-intensive. We offer three different modalities to represent a 3D scene: 3D scan, egocentric video and BEV picture.
16 PAPERS • 2 BENCHMARKS
A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.
2 PAPERS • NO BENCHMARKS YET
The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:
13 PAPERS • 8 BENCHMARKS