Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

27 Oct 2021  ·  Weixuan Sun, Jing Zhang, Nick Barnes ·

Image-level weakly supervised semantic segmentation (WSSS) relies on class activation maps (CAMs) for pseudo labels generation. As CAMs only highlight the most discriminative regions of objects, the generated pseudo labels are usually unsatisfactory to serve directly as supervision. To solve this, most existing approaches follow a multi-training pipeline to refine CAMs for better pseudo-labels, which includes: 1) re-training the classification model to generate CAMs; 2) post-processing CAMs to obtain pseudo labels; and 3) training a semantic segmentation model with the obtained pseudo labels. However, this multi-training pipeline requires complicated adjustment and additional time. To address this, we propose a class-conditional inference strategy and an activation aware mask refinement loss function to generate better pseudo labels without re-training the classifier. The class conditional inference-time approach is presented to separately and iteratively reveal the classification network's hidden object activation to generate more complete response maps. Further, our activation aware mask refinement loss function introduces a novel way to exploit saliency maps during segmentation training and refine the foreground object masks without suppressing background objects. Our method achieves superior WSSS results without requiring re-training of the classifier.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Weakly-Supervised Semantic Segmentation PASCAL VOC 2012 test Infer-CAM(DeepLabV2-R101) Mean IoU 71.8 # 22
Weakly-Supervised Semantic Segmentation PASCAL VOC 2012 val Infer-CAM(DeepLabV2-ResNet101) Mean IoU 70.8 # 30

Methods