Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-training with Multi-Ratio Masking

Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. However, extending masking to 3D point cloud understanding that combines local and global features poses a new challenge. In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works. Specifically, to avoid fitting to a fixed masking ratio, we first propose multi-ratio masking, which prompts the encoder to fully explore representative features thanks to tasks of different difficulties. Next, to encourage the embedding of both local and global features, we formulate a compound loss, which consists of (i) a global representation contrastive loss to encourage the cluster assignments of the masked point clouds to be consistent to that of the completed input, and (ii) a local point cloud prediction loss to encourage accurate prediction of masked points. Equipped with our Point-LGMask, we show that our learned representations transfer well to various downstream tasks, including few-shot classification, shape classification, object part segmentation, as well as real-world scene-based 3D object detection and 3D semantic segmentation. Particularly, our model largely advances existing pre-training methods on the difficult few-shot classification task using the real-captured ScanObjectNN dataset by surpassing over 4% to the second-best method. Also, our Point-LGMask achieves 0.4% AP25 and 0.8% AP50 gains on 3D object detection task over the second-best method. 0.4% mAcc and 0.5% mIoU. Codes have been released at https://github.com/TangYuan96/Point-LGMask

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Few-Shot 3D Point Cloud Classification ModelNet40 10-way (10-shot) Point-LGMask Overall Accuracy 92.6 # 9
Standard Deviation 4.3 # 16
Few-Shot 3D Point Cloud Classification ModelNet40 10-way (20-shot) Point-LGMask Overall Accuracy 95.1 # 10
Standard Deviation 3.4 # 17
Few-Shot 3D Point Cloud Classification ModelNet40 5-way (10-shot) Point-LGMask Overall Accuracy 97.4 # 3
Standard Deviation 2.0 # 9
Few-Shot 3D Point Cloud Classification ModelNet40 5-way (20-shot) Point-LGMask Overall Accuracy 98.1 # 8
Standard Deviation 1.4 # 9
3D Point Cloud Classification ScanObjectNN Point-LGMask Overall Accuracy 85.3 # 40
OBJ-BG (OA) 89.8 # 17
OBJ-ONLY (OA) 89.3 # 14

Methods