PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO and Cityscapes with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training. The code is available at https://github.com/janghyuncho/PiCIE.
PDF Abstract CVPR 2021 PDF CVPR 2021 AbstractCode
Datasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Unsupervised Semantic Segmentation | Cityscapes test | PiCIE | mIoU | 12.3 | # 4 | |
Accuracy | 65.5 | # 4 | ||||
Unsupervised Semantic Segmentation | COCO-All | PiCIE + H. | mIoU | 14.36 | # 4 | |
Pixel Accuracy | 49.99 | # 2 | ||||
Unsupervised Semantic Segmentation | COCO-All | PiCIE | mIoU | 13.84 | # 5 | |
Pixel Accuracy | 48.09 | # 3 | ||||
Unsupervised Semantic Segmentation | COCO-Stuff | PiCIE + H | Pixel Accuracy | 49.99 | # 6 | |
mIoU | 14.36 | # 7 | ||||
Unsupervised Semantic Segmentation | COCO-Stuff | PiCIE | Pixel Accuracy | 38.8 | # 10 | |
mIoU | 13.84 | # 8 | ||||
Unsupervised Semantic Segmentation | COCO-Stuff-27 | PiCIE+H | Accuracy | 50.0 | # 5 | |
Unsupervised Semantic Segmentation | COCO-Stuff-27 | PiCIE | Accuracy | 48.1 | # 6 | |
Unsupervised Semantic Segmentation | ImageNet-S-50 | PiCIE (Supervised pretrain) | mIoU (val) | 17.8 | # 4 | |
mIoU (test) | 17.6 | # 4 | ||||
Semantic Segmentation | RSMSS | PiCIE | mIoU | 33.1% | # 7 |