Point Transformer

Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification. Our Point Transformer design improves upon prior work across domains and tasks. For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70.4% on Area 5, outperforming the strongest prior model by 3.3 absolute percentage points and crossing the 70% mIoU threshold for the first time.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Point Cloud Classification ModelNet40 PointTransformer Overall Accuracy 93.7 # 23
Mean Accuracy 90.6 # 17
Point Cloud Segmentation PointCloud-C PointTransformers mean Corruption Error (mCE) 1.049 # 7
Semantic Segmentation S3DIS KPConv Mean IoU 70.6 # 13
Semantic Segmentation S3DIS PointCNN Mean IoU 65.4 # 27
Semantic Segmentation S3DIS PointTransformer Mean IoU 73.5 # 7
mAcc 81.9 # 10
oAcc 90.2 # 5
Semantic Segmentation S3DIS SPGraph Mean IoU 62.1 # 32
Semantic Segmentation S3DIS PointNet Mean IoU 47.6 # 41
Semantic Segmentation S3DIS Area5 PointNet mIoU 41.1 # 23
Semantic Segmentation S3DIS Area5 PointTransformer mIoU 70.4 # 6
oAcc 90.8 # 4
mAcc 76.5 # 5
Semantic Segmentation S3DIS Area5 PointCNN mIoU 57.3 # 20
3D Part Segmentation ShapeNet-Part PointTransformer Class Average IoU 83.7 # 13
Instance Average IoU 86.6 # 11
3D Semantic Segmentation STPLS3D Point transformer mIOU 47.64 # 3

Methods