TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Question Answering (3D-QA)	3D MM-Vet	ShapeLLM-13B	Overall Accuracy	53.1	# 1
3D Question Answering (3D-QA)	3D MM-Vet	ShapeLLM-7B	Overall Accuracy	47.4	# 2
3D Point Cloud Linear Classification	ModelNet40	ReCon++	Overall Accuracy	93.6	# 1
Generative 3D Object Classification	ModelNet40	ShapeLLM-13B	ModelNet40 (Average)	52.96	# 2
Generative 3D Object Classification	ModelNet40	ShapeLLM-13B	LLM Size	13B	# 1
Generative 3D Object Classification	ModelNet40	ShapeLLM-7B	ModelNet40 (Average)	53.08	# 1
Generative 3D Object Classification	ModelNet40	ShapeLLM-7B	LLM Size	7B	# 1
3D Point Cloud Classification	ModelNet40	ReCon++	Overall Accuracy	95.0	# 4
Zero-Shot Transfer 3D Point Cloud Classification	ModelNet40	ReCon++	Accuracy (%)	87.3	# 3
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (10-shot)	ReCon++	Overall Accuracy	94.5	# 1
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (10-shot)	ReCon++	Standard Deviation	4.1	# 13
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (20-shot)	ReCon++	Overall Accuracy	96.5	# 1
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (20-shot)	ReCon++	Standard Deviation	3.0	# 10
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (10-shot)	ReCon++	Overall Accuracy	98.0	# 1
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (10-shot)	ReCon++	Standard Deviation	2.3	# 11
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (20-shot)	ReCon++	Overall Accuracy	99.5	# 1
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (20-shot)	ReCon++	Standard Deviation	0.8	# 1
3D Object Captioning	Objaverse	ShapeLLM-13B	GPT-4	48.94	# 1
3D Object Captioning	Objaverse	ShapeLLM-13B	Sentence-BERT	48.52	# 1
3D Object Captioning	Objaverse	ShapeLLM-13B	SimCSE	49.98	# 1
3D Object Captioning	Objaverse	ShapeLLM-13B	LLM Size (B)	13	# 3
3D Object Captioning	Objaverse	ShapeLLM-7B	GPT-4	46.92	# 3
3D Object Captioning	Objaverse	ShapeLLM-7B	Sentence-BERT	48.20	# 2
3D Object Captioning	Objaverse	ShapeLLM-7B	SimCSE	49.23	# 2
3D Object Captioning	Objaverse	ShapeLLM-7B	LLM Size (B)	7	# 1
Generative 3D Object Classification	Objaverse	ShapeLLM-7B	Objaverse (Average)	54.50	# 1
Generative 3D Object Classification	Objaverse	ShapeLLM-13B	Objaverse (Average)	54.00	# 2
Zero-shot 3D classification	Objaverse LVIS	ReCon++	Top 1 Accuracy	53.7	# 3
Zero-Shot Transfer 3D Point Cloud Classification	ScanObjectNN	ReCon++	OBJ_ONLY Accuracy(%)	65.4	# 1
3D Point Cloud Classification	ScanObjectNN	ReCon++	Overall Accuracy	95.25	# 2
3D Point Cloud Classification	ScanObjectNN	ReCon++	OBJ-BG (OA)	97.59	# 1
3D Point Cloud Classification	ScanObjectNN	ReCon++	OBJ-ONLY (OA)	98.80	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/3d-question-answering-3d-qa-on-3d-mm-vet)](https://paperswithcode.com/sota/3d-question-answering-3d-qa-on-3d-mm-vet?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/3d-point-cloud-linear-classification-on)](https://paperswithcode.com/sota/3d-point-cloud-linear-classification-on?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/generative-3d-object-classification-on)](https://paperswithcode.com/sota/generative-3d-object-classification-on?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/few-shot-3d-point-cloud-classification-on-3)](https://paperswithcode.com/sota/few-shot-3d-point-cloud-classification-on-3?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/few-shot-3d-point-cloud-classification-on-4)](https://paperswithcode.com/sota/few-shot-3d-point-cloud-classification-on-4?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/few-shot-3d-point-cloud-classification-on-1)](https://paperswithcode.com/sota/few-shot-3d-point-cloud-classification-on-1?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/few-shot-3d-point-cloud-classification-on-2)](https://paperswithcode.com/sota/few-shot-3d-point-cloud-classification-on-2?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/3d-object-captioning-on-objaverse)](https://paperswithcode.com/sota/3d-object-captioning-on-objaverse?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/generative-3d-object-classification-on-1)](https://paperswithcode.com/sota/generative-3d-object-classification-on-1?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/zero-shot-transfer-3d-point-cloud-2)](https://paperswithcode.com/sota/zero-shot-transfer-3d-point-cloud-2?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/3d-point-cloud-classification-on-scanobjectnn)](https://paperswithcode.com/sota/3d-point-cloud-classification-on-scanobjectnn?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/zero-shot-transfer-3d-point-cloud)](https://paperswithcode.com/sota/zero-shot-transfer-3d-point-cloud?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/zero-shot-3d-classification-on-objaverse-lvis)](https://paperswithcode.com/sota/zero-shot-3d-classification-on-objaverse-lvis?p=shapellm-universal-3d-object-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shapellm-universal-3d-object-understanding/3d-point-cloud-classification-on-modelnet40)](https://paperswithcode.com/sota/3d-point-cloud-classification-on-modelnet40?p=shapellm-universal-3d-object-understanding)`

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

27 Feb 2024 · Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma ·

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated evaluation benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks, such as embodied visual grounding.

PDF Abstract

Code

Add Remove Mark official

qizekun/ShapeLLM official

qizekun/ReCon

110

runpeidong/act

Tasks

Add Remove

3D Object Captioning

3D Point Cloud Classification

3D Point Cloud Linear Classification

3D Question Answering (3D-QA)

Few-Shot 3D Point Cloud Classification

Generative 3D Object Classification

Instruction Following

Language Modelling

Large Language Model

Object

Visual Grounding

Zero-shot 3D classification

Zero-Shot Transfer 3D Point Cloud Classification

Datasets

Introduced in the Paper:

3D MM-Vet

Used in the Paper:

ShapeNet

ModelNet

LVIS

ScanObjectNN

Objaverse

Results from the Paper

Add Remove

Ranked #1 on 3D Object Captioning on Objaverse

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Question Answering (3D-QA)	3D MM-Vet	ShapeLLM-13B	Overall Accuracy	53.1	# 1	Compare
3D Question Answering (3D-QA)	3D MM-Vet	ShapeLLM-7B	Overall Accuracy	47.4	# 2	Compare
3D Point Cloud Linear Classification	ModelNet40	ReCon++	Overall Accuracy	93.6	# 1	Compare
Generative 3D Object Classification	ModelNet40	ShapeLLM-13B	ModelNet40 (Average)	52.96	# 2	Compare
Generative 3D Object Classification	ModelNet40	ShapeLLM-13B	LLM Size	13B	# 1	Compare
Generative 3D Object Classification	ModelNet40	ShapeLLM-7B	ModelNet40 (Average)	53.08	# 1	Compare
Generative 3D Object Classification	ModelNet40	ShapeLLM-7B	LLM Size	7B	# 1	Compare
3D Point Cloud Classification	ModelNet40	ReCon++	Overall Accuracy	95.0	# 4	Compare
Zero-Shot Transfer 3D Point Cloud Classification	ModelNet40	ReCon++	Accuracy (%)	87.3	# 3	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (10-shot)	ReCon++	Overall Accuracy	94.5	# 1	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (10-shot)	ReCon++	Standard Deviation	4.1	# 13	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (20-shot)	ReCon++	Overall Accuracy	96.5	# 1	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 10-way (20-shot)	ReCon++	Standard Deviation	3.0	# 10	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (10-shot)	ReCon++	Overall Accuracy	98.0	# 1	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (10-shot)	ReCon++	Standard Deviation	2.3	# 11	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (20-shot)	ReCon++	Overall Accuracy	99.5	# 1	Compare
Few-Shot 3D Point Cloud Classification	ModelNet40 5-way (20-shot)	ReCon++	Standard Deviation	0.8	# 1	Compare
3D Object Captioning	Objaverse	ShapeLLM-13B	GPT-4	48.94	# 1	Compare
			Sentence-BERT	48.52	# 1	Compare
			SimCSE	49.98	# 1	Compare
			LLM Size (B)	13	# 3	Compare
3D Object Captioning	Objaverse	ShapeLLM-7B	GPT-4	46.92	# 3	Compare
			Sentence-BERT	48.20	# 2	Compare
			SimCSE	49.23	# 2	Compare
			LLM Size (B)	7	# 1	Compare
Generative 3D Object Classification	Objaverse	ShapeLLM-7B	Objaverse (Average)	54.50	# 1	Compare
Generative 3D Object Classification	Objaverse	ShapeLLM-13B	Objaverse (Average)	54.00	# 2	Compare
Zero-shot 3D classification	Objaverse LVIS	ReCon++	Top 1 Accuracy	53.7	# 3	Compare
Zero-Shot Transfer 3D Point Cloud Classification	ScanObjectNN	ReCon++	OBJ_ONLY Accuracy(%)	65.4	# 1	Compare
3D Point Cloud Classification	ScanObjectNN	ReCon++	Overall Accuracy	95.25	# 2	Compare
			OBJ-BG (OA)	97.59	# 1	Compare
			OBJ-ONLY (OA)	98.80	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove