TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
3D Point Cloud Classification	ModelNet40	Point Voxel Transformer	Overall Accuracy	94.0	# 22
3D Part Segmentation	ShapeNet-Part	Point Voxel Transformer	Instance Average IoU	86.5	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/point-voxel-transformer-an-efficient-approach/3d-part-segmentation-on-shapenet-part)](https://paperswithcode.com/sota/3d-part-segmentation-on-shapenet-part?p=point-voxel-transformer-an-efficient-approach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/point-voxel-transformer-an-efficient-approach/3d-point-cloud-classification-on-modelnet40)](https://paperswithcode.com/sota/3d-point-cloud-classification-on-modelnet40?p=point-voxel-transformer-an-efficient-approach)`

PVT: Point-Voxel Transformer for Point Cloud Learning

13 Aug 2021 · Cheng Zhang, Haocheng Wan, Xinyi Shen, Zizhao Wu ·

The recently developed pure Transformer architectures have attained promising accuracy on point cloud learning benchmarks compared to convolutional neural networks. However, existing point cloud Transformers are computationally expensive since they waste a significant amount of time on structuring the irregular data. To solve this shortcoming, we present Sparse Window Attention (SWA) module to gather coarse-grained local features from non-empty voxels, which not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxel resolution. Meanwhile, to gather fine-grained features about the global shape, we introduce relative attention (RA) module, a more robust self-attention variant for rigid transformations of objects. Equipped with the SWA and RA, we construct our neural architecture called PVT that integrates both modules into a joint framework for point cloud learning. Compared with previous Transformer-based and attention-based models, our method attains top accuracy of 94.0% on classification benchmark and 10x inference speedup on average. Extensive experiments also valid the effectiveness of PVT on part and semantic segmentation benchmarks (86.6% and 69.2% mIoU, respectively).

PDF Abstract

Code

Add Remove Mark official

HaochengWan/PVT official

Mind23-2/MindCode-109

Tasks

Add Remove

3D Object Detection

3D Part Segmentation

3D Point Cloud Classification

Object Detection

Semantic Segmentation

valid

Datasets

KITTI

ShapeNet

ModelNet

SemanticKITTI

S3DIS

Results from the Paper

Edit

Ranked #18 on 3D Part Segmentation on ShapeNet-Part

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
3D Point Cloud Classification	ModelNet40	Point Voxel Transformer	Overall Accuracy	94.0	# 22		Compare
3D Part Segmentation	ShapeNet-Part	Point Voxel Transformer	Instance Average IoU	86.5	# 18		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • PointNet • Position-Wise Feed-Forward Layer • PVT • Residual Connection • Scaled Dot-Product Attention • Softmax • Spatial-Reduction Attention • Transformer

Edit Social Preview

PVT: Point-Voxel Transformer for Point Cloud Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove