Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation

Existing studies in weakly-supervised semantic segmentation (WSSS) using image-level weak supervision have several limitations: sparse object coverage, inaccurate object boundaries, and co-occurring pixels from non-target objects. To overcome these challenges, we propose a novel framework, namely Explicit Pseudo-pixel Supervision (EPS), which learns from pixel-level feedback by combining two weak supervisions; the image-level label provides the object identity via the localization map and the saliency map from the off-the-shelf saliency detection model offers rich boundaries. We devise a joint training strategy to fully utilize the complementary relationship between both information. Our method can obtain accurate object boundaries and discard co-occurring pixels, thereby significantly improving the quality of pseudo-masks. Experimental results show that the proposed method remarkably outperforms existing methods by resolving key challenges of WSSS and achieves the new state-of-the-art performance on both PASCAL VOC 2012 and MS COCO 2014 datasets.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Weakly-Supervised Semantic Segmentation COCO 2014 val EPS mIoU 35.7 # 33
Weakly-Supervised Semantic Segmentation PASCAL VOC 2012 test EPS(DeepLabV1-ResNet101 Mean IoU 71.8 # 22
Weakly-Supervised Semantic Segmentation PASCAL VOC 2012 test EPS(DeepLabV2-ResNet101) Mean IoU 70.8 # 30
Weakly-Supervised Semantic Segmentation PASCAL VOC 2012 val EPS(DeepLabV2-ResNet101) Mean IoU 70.9 # 29
Weakly-Supervised Semantic Segmentation PASCAL VOC 2012 val EPS(DeepLabV1-ResNet101) Mean IoU 71.0 # 27

Methods


No methods listed for this paper. Add relevant methods here