MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Domain Adaptation Cityscapes to ACDC MIC mIoU 70.4 # 5
Unsupervised Domain Adaptation Cityscapes to Foggy Cityscapes MIC mAP@0.5 47.6 # 8
Image-to-Image Translation Cityscapes-to-Foggy Cityscapes MIC mAP 47.6 # 1
Semantic Segmentation Dark Zurich MIC mIoU 60.2 # 3
Domain Adaptation GTA5 to Cityscapes MIC mIoU 75.9 # 4
Semantic Segmentation GTAV-to-Cityscapes Labels MIC mIoU 75.9 # 1
Synthetic-to-Real Translation GTAV-to-Cityscapes Labels HRDA+MIC mIoU 75.9 # 2
Unsupervised Domain Adaptation GTAV-to-Cityscapes Labels MIC mIoU 75.9 # 1
Image-to-Image Translation GTAV-to-Cityscapes Labels MIC mIoU 75.9 # 1
Domain Adaptation Office-Home MIC Accuracy 86.2 # 4
Domain Adaptation SYNTHIA-to-Cityscapes MIC mIoU 67.3 # 5
Semantic Segmentation SYNTHIA-to-Cityscapes MIC Mean IoU 67.3 # 2
Unsupervised Domain Adaptation SYNTHIA-to-Cityscapes MIC mIoU (13 classes) 74.0 # 3
mIoU 67.3 # 2
Image-to-Image Translation SYNTHIA-to-Cityscapes MIC mIoU (13 classes) 74.0 # 2
Synthetic-to-Real Translation SYNTHIA-to-Cityscapes MIC MIoU (13 classes) 74.0 # 3
MIoU (16 classes) 67.3 # 3
Domain Adaptation VisDA2017 MIC Accuracy 92.8 # 1


No methods listed for this paper. Add relevant methods here