CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing

journal 2023  ·  WuJie Zhou, Shaohua Dong, Meixin Fang, Lu Yu ·

Color–thermal (RGB-T) urban scene parsing has recently attracted widespread interest. However, most existing approaches to RGB-T urban scene parsing do not deeply explore the information complementarity between RGB-T features. In this study, we propose a cross-modal attention-cascaded fusion network (CACFNet) that fully exploits cross-modality. In our design, a cross-modal attention fusion module mines complementary information from two modalities. Subsequently, a cascaded fusion module decodes the multi-level features in an up-bottom manner. Noting that each pixel is labeled with the category of the region to which it belongs, we present a region-based module that explores the relationship between pixel and region. Moreover, in contrast to previous methods that employ only the cross-entropy loss to penalize pixel-wise predictions, we propose an additional loss to learn pixel–pixel relationships. Extensive experiments on two datasets demonstrate that the proposed CACFNet achieves state-of-the-art performance in RGB-T urban scene parsing

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Thermal Image Segmentation MFN Dataset CACFNet mIOU 57.8 # 12
Thermal Image Segmentation PST900 CACFNet mIoU 86.56 # 5

Methods


No methods listed for this paper. Add relevant methods here