G2L: A Global to Local Alignment Method for Unsupervised Domain Adaptive Semantic Segmentation
Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer knowledge from a source dataset with dense pixel-level annotations to an unlabeled target dataset. However, the performance of UDA methods often suffers from the domain shift, which is the discrepancy between the feature distributions of the two domains. There have been several attempts to match these distributions at the image level marginally. However, due to the so-called category-level domain shift, such global alignments do not guarantee a good separability of deep features extracted from different categories in the target domain. As a result, the generated pseudo-labels can be noisy and thus poison the learning process on the target domain. Some recent methods focus on denoising the pseudo-labels online using category-wise information. This paper introduces a novel UDA method called Global-to-Local alignment (G2L) that leverages fine-grained adversarial training and a newly proposed chromatic Fourier transform to address the image-level domain shift from a global perspective. Next, our method deals with the category-level domain shift under a local view. Specifically, we propose a long-tail category rating strategy as well as apply dynamic confidence thresholds and category-wise priority weights when generating and denoising the pseudo-labels to favor rare categories. Finally, self-distillation is used to boost the final segmentation results. Experiments on popular benchmarks GTA5 → Cityscapes and SYNTHIA → Cityscapes show that our method yields superior accuracy performance than other state-of-the-art methods.
PDFDatasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Domain Adaptation | GTA5 to Cityscapes | G2L | mIoU | 59.7 | # 16 | |
Synthetic-to-Real Translation | GTAV-to-Cityscapes Labels | G2L | mIoU | 59.7 | # 18 | |
Semantic Segmentation | GTAV-to-Cityscapes Labels | G2L | mIoU | 59.7 | # 8 | |
Unsupervised Domain Adaptation | GTAV-to-Cityscapes Labels | G2L | mIoU | 59.7 | # 13 | |
Image-to-Image Translation | GTAV-to-Cityscapes Labels | G2L | mIoU | 59.7 | # 13 | |
Synthetic-to-Real Translation | SYNTHIA-to-Cityscapes | G2L | MIoU (13 classes) | 64.4 | # 12 | |
MIoU (16 classes) | 56.8 | # 14 | ||||
Unsupervised Domain Adaptation | SYNTHIA-to-Cityscapes | G2L | mIoU (13 classes) | 64.4 | # 11 | |
mIoU | 56.8 | # 7 |