H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection
Cross-domain weakly supervised object detection (CDWSOD) aims to adapt the detection model to a novel target domain with easily acquired image-level annotations. How to align the source and target domains is critical to the CDWSOD accuracy. Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hierarchical Feature Alignment (H^2FA) R-CNN. H^2FA R-CNN enforces two image-level alignments for the backbone features, as well as two instance-level alignments for the RPN and detection head. This coarse-to-fine aligning hierarchy is in pace with the detection pipeline, i.e., processing the image-level feature and the instance-level features from bottom to top. Importantly, we devise a novel hybrid supervision method for learning two instance-level alignments. It enables the RPN and detection head to simultaneously receive weak/full supervision from the target/source domains. Combining all these feature alignments, H^2FA R-CNN effectively mitigates the gap between the source and target domains. Experimental results show that H^2FA R-CNN significantly improves cross-domain object detection accuracy and sets new state of the art on popular benchmarks. Code and pre-trained models are available at https://github.com/XuYunqiu/H2FA_R-CNN.
PDF AbstractCode
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Weakly Supervised Object Detection | Clipart1k | H2FA R-CNN (clipart_all) | MAP | 69.8 | # 1 | ||
Weakly Supervised Object Detection | Clipart1k | H2FA R-CNN (clipart_test) | MAP | 55.3 | # 3 | ||
Weakly Supervised Object Detection | Comic2k | H2FA R-CNN (+extra) | MAP | 53.0 | # 3 | ||
Weakly Supervised Object Detection | Comic2k | H2FA R-CNN | MAP | 46.4 | # 4 | ||
Weakly Supervised Object Detection | Watercolor2k | H2FA R-CNN (+extra) | MAP | 62.6 | # 3 | ||
Weakly Supervised Object Detection | Watercolor2k | H2FA R-CNN | MAP | 59.9 | # 4 |