TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Object Detection	S3DIS	Swin3D-L+FCAF3D	mAP@0.5	54.0	# 2
3D Object Detection	S3DIS	Swin3D-L+FCAF3D	mAP@0.25	72.1	# 3
Semantic Segmentation	S3DIS	Swin3D-L	Mean IoU	79.8	# 3
Semantic Segmentation	S3DIS	Swin3D-L	mAcc	88.0	# 1
Semantic Segmentation	S3DIS	Swin3D-L	oAcc	92.4	# 3
Semantic Segmentation	S3DIS	Swin3D-L	Number of params	N/A	# 1
Semantic Segmentation	S3DIS Area5	Swin3D-L	mIoU	74.5	# 3
Semantic Segmentation	S3DIS Area5	Swin3D-L	oAcc	92.7	# 1
Semantic Segmentation	S3DIS Area5	Swin3D-L	mAcc	80.5	# 1
Semantic Segmentation	S3DIS Area5	Swin3D-L	Number of params	N/A	# 2
Semantic Segmentation	ScanNet	Swin3D-L	test mIoU	77.9	# 4
Semantic Segmentation	ScanNet	Swin3D-L	val mIoU	77.5	# 2
3D Object Detection	ScanNetV2	Swin3D-L+CAGroup3D	mAP@0.25	76.4	# 4
3D Object Detection	ScanNetV2	Swin3D-L+CAGroup3D	mAP@0.5	63.2	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-s3dis)](https://paperswithcode.com/sota/3d-object-detection-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis-area5)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis-area5?p=swin3d-a-pretrained-transformer-backbone-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-scannet)](https://paperswithcode.com/sota/semantic-segmentation-on-scannet?p=swin3d-a-pretrained-transformer-backbone-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-scannetv2)](https://paperswithcode.com/sota/3d-object-detection-on-scannetv2?p=swin3d-a-pretrained-transformer-backbone-for)`

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

14 Apr 2023 · Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, Baining Guo ·

The use of pretrained backbones with fine-tuning has been successful for 2D vision and natural language processing tasks, showing advantages over task-specific networks. In this work, we introduce a pretrained 3D backbone, called {\SST}, for 3D indoor scene understanding. We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear memory complexity, making the backbone scalable to large models and datasets. We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance. We pretrained a large {\SST} model on a synthetic Structured3D dataset, which is an order of magnitude larger than the ScanNet dataset. Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets, but also outperforms state-of-the-art methods on downstream tasks with +2.3 mIoU and +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation, +1.8 mIoU on ScanNet segmentation (val), +1.9 mAP@0.5 on ScanNet detection, and +8.1 mAP@0.5 on S3DIS detection. A series of extensive ablation studies further validate the scalability, generality, and superior performance enabled by our approach. The code and models are available at https://github.com/microsoft/Swin3D .

PDF Abstract

Code

Add Remove Mark official

microsoft/swin3d official

167

Pointcept/Pointcept

1,103

Tasks

Add Remove

3D Object Detection

Scene Understanding

Segmentation

Semantic Segmentation

Datasets

ShapeNet

ScanNet

S3DIS

Structured3D

Results from the Paper

Add Remove

Ranked #2 on 3D Object Detection on S3DIS (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Object Detection	S3DIS	Swin3D-L+FCAF3D	mAP@0.5	54.0	# 2	Compare
3D Object Detection	S3DIS	Swin3D-L+FCAF3D	mAP@0.25	72.1	# 3	Compare
Semantic Segmentation	S3DIS	Swin3D-L	Mean IoU	79.8	# 3	Compare
			mAcc	88.0	# 1	Compare
			oAcc	92.4	# 3	Compare
			Number of params	N/A	# 1	Compare
Semantic Segmentation	S3DIS Area5	Swin3D-L	mIoU	74.5	# 3	Compare
			oAcc	92.7	# 1	Compare
			mAcc	80.5	# 1	Compare
			Number of params	N/A	# 2	Compare
Semantic Segmentation	ScanNet	Swin3D-L	test mIoU	77.9	# 4	Compare
Semantic Segmentation	ScanNet	Swin3D-L	val mIoU	77.5	# 2	Compare
3D Object Detection	ScanNetV2	Swin3D-L+CAGroup3D	mAP@0.25	76.4	# 4	Compare
3D Object Detection	ScanNetV2	Swin3D-L+CAGroup3D	mAP@0.5	63.2	# 4	Compare

Methods

Add Remove

Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer

Edit Social Preview

Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove