PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

14 Nov 2022  ยท  Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua ยท

Unsupervised Domain Adaptation (UDA) aims to enhance the generalization of the learned model to other domains. The domain-invariant knowledge is transferred from the model trained on labeled source domain, e.g., video game, to unlabeled target domains, e.g., real-world scenarios, saving annotation expenses. Existing UDA methods for semantic segmentation usually focus on minimizing the inter-domain discrepancy of various levels, e.g., pixels, features, and predictions, for extracting domain-invariant knowledge. However, the primary intra-domain knowledge, such as context correlation inside an image, remains underexplored. In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts. The proposed framework exploits the inherent structures of intra-domain images, which: (1) explicitly encourages learning the discriminative pixel-wise features with intra-class compactness and inter-class separability, and (2) motivates the robust feature learning of the identical patch against different contexts or fluctuations. Extensive experiments verify the effectiveness of the proposed method, which obtains competitive accuracy on the two widely-used UDA benchmarks, i.e., 75.6 mIoU on GTA to Cityscapes and 68.2 mIoU on Synthia to Cityscapes. Moreover, our method is compatible with other UDA approaches to further improve the performance without introducing extra parameters.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Domain Adaptation GTA5 to Cityscapes HRDA+PiPa mIoU 75.6 # 4
Image-to-Image Translation GTAV-to-Cityscapes Labels HRDA + PiPa mIoU 75.6 # 2
Image-to-Image Translation GTAV-to-Cityscapes Labels DAFormer + PiPa mIoU 71.7 # 4
Synthetic-to-Real Translation GTAV-to-Cityscapes Labels DAFormer+PiPa mIoU 71.7 # 6
Synthetic-to-Real Translation GTAV-to-Cityscapes Labels HRDA+PiPa mIoU 75.6 # 3
Semantic Segmentation GTAV-to-Cityscapes Labels HRDA + PiPa mIoU 75.6 # 2
Unsupervised Domain Adaptation GTAV-to-Cityscapes Labels HRDA + PiPa mIoU 75.6 # 2
Unsupervised Domain Adaptation GTAV-to-Cityscapes Labels DAFormer + PiPa mIoU 71.7 # 5
Unsupervised Domain Adaptation SYNTHIA-to-Cityscapes HRDA + PiPa mIoU (13 classes) 74.8 # 2
Image-to-Image Translation SYNTHIA-to-Cityscapes HRDA + PiPa mIoU (13 classes) 74.8 # 1
Semantic Segmentation SYNTHIA-to-Cityscapes HRDA + PiPa Mean IoU 68.2 # 1
Domain Adaptation SYNTHIA-to-Cityscapes HRDA+PiPa mIoU 68.2 # 3
Synthetic-to-Real Translation SYNTHIA-to-Cityscapes HRDA+PiPa MIoU (13 classes) 74.8 # 2
MIoU (16 classes) 68.2 # 2

Methods


No methods listed for this paper. Add relevant methods here