H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection

CVPR 2022  ยท  Yunqiu Xu, Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang ยท

Cross-domain weakly supervised object detection (CDWSOD) aims to adapt the detection model to a novel target domain with easily acquired image-level annotations. How to align the source and target domains is critical to the CDWSOD accuracy. Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hierarchical Feature Alignment (H^2FA) R-CNN. H^2FA R-CNN enforces two image-level alignments for the backbone features, as well as two instance-level alignments for the RPN and detection head. This coarse-to-fine aligning hierarchy is in pace with the detection pipeline, i.e., processing the image-level feature and the instance-level features from bottom to top. Importantly, we devise a novel hybrid supervision method for learning two instance-level alignments. It enables the RPN and detection head to simultaneously receive weak/full supervision from the target/source domains. Combining all these feature alignments, H^2FA R-CNN effectively mitigates the gap between the source and target domains. Experimental results show that H^2FA R-CNN significantly improves cross-domain object detection accuracy and sets new state of the art on popular benchmarks. Code and pre-trained models are available at https://github.com/XuYunqiu/H2FA_R-CNN.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Weakly Supervised Object Detection Clipart1k H2FA R-CNN (clipart_all) MAP 69.8 # 1
Weakly Supervised Object Detection Clipart1k H2FA R-CNN (clipart_test) MAP 55.3 # 3
Weakly Supervised Object Detection Comic2k H2FA R-CNN (+extra) MAP 53.0 # 3
Weakly Supervised Object Detection Comic2k H2FA R-CNN MAP 46.4 # 4
Weakly Supervised Object Detection Watercolor2k H2FA R-CNN (+extra) MAP 62.6 # 3
Weakly Supervised Object Detection Watercolor2k H2FA R-CNN MAP 59.9 # 4

Methods


ALIGN โ€ข RPN