Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-training with Multi-Ratio Masking
Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. However, extending masking to 3D point cloud understanding that combines local and global features poses a new challenge. In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works. Specifically, to avoid fitting to a fixed masking ratio, we first propose multi-ratio masking, which prompts the encoder to fully explore representative features thanks to tasks of different difficulties. Next, to encourage the embedding of both local and global features, we formulate a compound loss, which consists of (i) a global representation contrastive loss to encourage the cluster assignments of the masked point clouds to be consistent to that of the completed input, and (ii) a local point cloud prediction loss to encourage accurate prediction of masked points. Equipped with our Point-LGMask, we show that our learned representations transfer well to various downstream tasks, including few-shot classification, shape classification, object part segmentation, as well as real-world scene-based 3D object detection and 3D semantic segmentation. Particularly, our model largely advances existing pre-training methods on the difficult few-shot classification task using the real-captured ScanObjectNN dataset by surpassing over 4% to the second-best method. Also, our Point-LGMask achieves 0.4% AP25 and 0.8% AP50 gains on 3D object detection task over the second-best method. 0.4% mAcc and 0.5% mIoU. Codes have been released at https://github.com/TangYuan96/Point-LGMask
PDF AbstractCode
Datasets
Results from the Paper
Ranked #3 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Few-Shot 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Point-LGMask | Overall Accuracy | 92.6 | # 9 | ||
Standard Deviation | 4.3 | # 16 | |||||
Few-Shot 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Point-LGMask | Overall Accuracy | 95.1 | # 10 | ||
Standard Deviation | 3.4 | # 17 | |||||
Few-Shot 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Point-LGMask | Overall Accuracy | 97.4 | # 3 | ||
Standard Deviation | 2.0 | # 9 | |||||
Few-Shot 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Point-LGMask | Overall Accuracy | 98.1 | # 8 | ||
Standard Deviation | 1.4 | # 9 | |||||
3D Point Cloud Classification | ScanObjectNN | Point-LGMask | Overall Accuracy | 85.3 | # 40 | ||
OBJ-BG (OA) | 89.8 | # 17 | |||||
OBJ-ONLY (OA) | 89.3 | # 14 |