To miss-attend is to misalign! Residual Self-Attentive Feature Alignment for Adapting Object Detectors

Advancements in adaptive object detection can lead to tremendous improvements in applications like autonomous navigation, as they alleviate the distributional shifts along the detection pipeline. Prior works adopt adversarial learning to align image features at global and local levels, yet the instance-specific misalignment persists. Also, adaptive object detection remains challenging due to visual diversity in background scenes and intricate combinations of objects. Motivated by structural importance, we aim to attend prominent instance-specific regions, overcoming the feature misalignment issue. We propose a novel resIduaL seLf-attentive featUre alignMEnt (ILLUME) method for adaptive object detection. ILLUME comprises Self-Attention Feature Map (SAFM) module that enhances structural attention to object-related regions and thereby generates domain invariant features. Our approach significantly reduces the domain distance with the improved feature alignment of the instances. Qualitative results demonstrate the ability of ILLUME to attend important object instances required for alignment. Experimental results on several benchmark datasets show that our method outperforms the existing state-of-the-art approaches.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Unsupervised Domain Adaptation BDD100k to Cityscapes ILLUME mAP 29.6 # 2
Unsupervised Domain Adaptation Cityscapes to Foggy Cityscapes ILLUME mAP@0.5 43.8 # 12
Unsupervised Domain Adaptation Pascal VOC to Clipart1K ILLUME mAP 41.6 # 1

Methods