PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering

We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO and Cityscapes with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training. The code is available at https://github.com/janghyuncho/PiCIE.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Unsupervised Semantic Segmentation Cityscapes test PiCIE mIoU 12.3 # 4
Accuracy 65.5 # 4
Unsupervised Semantic Segmentation COCO-All PiCIE + H. mIoU 14.36 # 4
Pixel Accuracy 49.99 # 2
Unsupervised Semantic Segmentation COCO-All PiCIE mIoU 13.84 # 5
Pixel Accuracy 48.09 # 3
Unsupervised Semantic Segmentation COCO-Stuff PiCIE + H Pixel Accuracy 49.99 # 6
mIoU 14.36 # 7
Unsupervised Semantic Segmentation COCO-Stuff PiCIE Pixel Accuracy 38.8 # 10
mIoU 13.84 # 8
Unsupervised Semantic Segmentation COCO-Stuff-27 PiCIE+H Accuracy 50.0 # 5
Unsupervised Semantic Segmentation COCO-Stuff-27 PiCIE Accuracy 48.1 # 6
Unsupervised Semantic Segmentation ImageNet-S-50 PiCIE (Supervised pretrain) mIoU (val) 17.8 # 4
mIoU (test) 17.6 # 4
Semantic Segmentation RSMSS PiCIE mIoU 33.1% # 7

Methods


No methods listed for this paper. Add relevant methods here