TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	SQA3D	ScanQA	AnswerExactMatch (Question Answering)	46.58	# 4
Question Answering	SQA3D	ScanQA (w/ auxiliary loss)	AnswerExactMatch (Question Answering)	47.20	# 3
Referring Expression	SQA3D	Random	Acc@0.5m	14.60	# 1
Referring Expression	SQA3D	Random	Acc@1.0m	34.21	# 1
Referring Expression	SQA3D	Random	Acc@15°	22.39	# 1
Referring Expression	SQA3D	Random	Acc@30°	42.28	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sqa3d-situated-question-answering-in-3d/referring-expression-on-sqa3d-1)](https://paperswithcode.com/sota/referring-expression-on-sqa3d-1?p=sqa3d-situated-question-answering-in-3d)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sqa3d-situated-question-answering-in-3d/question-answering-on-sqa3d)](https://paperswithcode.com/sota/question-answering-on-sqa3d?p=sqa3d-situated-question-answering-in-3d)`

SQA3D: Situated Question Answering in 3D Scenes

14 Oct 2022 · Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang ·

We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g., 3D scan), SQA3D requires the tested agent to first understand its situation (position, orientation, etc.) in the 3D scene as described by text, then reason about its surrounding environment and answer a question under that situation. Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8k unique situations, along with 20.4k descriptions and 33.4k diverse reasoning questions for these situations. These questions examine a wide spectrum of reasoning capabilities for an intelligent agent, ranging from spatial relation comprehension to commonsense understanding, navigation, and multi-hop reasoning. SQA3D imposes a significant challenge to current multi-modal especially 3D reasoning models. We evaluate various state-of-the-art approaches and find that the best one only achieves an overall score of 47.20%, while amateur human participants can reach 90.06%. We believe SQA3D could facilitate future embodied AI research with stronger situation understanding and reasoning capability.

PDF Abstract

Code

Add Remove Mark official

SilongYong/SQA3D official

101

Tasks

Add Remove

Question Answering

Referring Expression

Scene Understanding

Datasets

Introduced in the Paper:

SQA3D

Used in the Paper:

ScanNet

Results from the Paper

Edit

Ranked #1 on Referring Expression on SQA3D

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	SQA3D	ScanQA	AnswerExactMatch (Question Answering)	46.58	# 4	Compare
Question Answering	SQA3D	ScanQA (w/ auxiliary loss)	AnswerExactMatch (Question Answering)	47.20	# 3	Compare
Referring Expression	SQA3D	Random	Acc@0.5m	14.60	# 1	Compare
			Acc@1.0m	34.21	# 1	Compare
			Acc@15°	22.39	# 1	Compare
			Acc@30°	42.28	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

SQA3D: Situated Question Answering in 3D Scenes

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove