TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Thermal Image Segmentation	MFN Dataset	CACFNet	mIOU	57.8	# 12
Thermal Image Segmentation	PST900	CACFNet	mIoU	86.56	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cacfnet-cross-modal-attention-cascaded-fusion/thermal-image-segmentation-on-pst900)](https://paperswithcode.com/sota/thermal-image-segmentation-on-pst900?p=cacfnet-cross-modal-attention-cascaded-fusion)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cacfnet-cross-modal-attention-cascaded-fusion/thermal-image-segmentation-on-mfn-dataset)](https://paperswithcode.com/sota/thermal-image-segmentation-on-mfn-dataset?p=cacfnet-cross-modal-attention-cascaded-fusion)`

CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing

journal 2023 · WuJie Zhou, Shaohua Dong, Meixin Fang, Lu Yu ·

Color–thermal (RGB-T) urban scene parsing has recently attracted widespread interest. However, most existing approaches to RGB-T urban scene parsing do not deeply explore the information complementarity between RGB-T features. In this study, we propose a cross-modal attention-cascaded fusion network (CACFNet) that fully exploits cross-modality. In our design, a cross-modal attention fusion module mines complementary information from two modalities. Subsequently, a cascaded fusion module decodes the multi-level features in an up-bottom manner. Noting that each pixel is labeled with the category of the region to which it belongs, we present a region-based module that explores the relationship between pixel and region. Moreover, in contrast to previous methods that employ only the cross-entropy loss to penalize pixel-wise predictions, we propose an additional loss to learn pixel–pixel relationships. Extensive experiments on two datasets demonstrate that the proposed CACFNet achieves state-of-the-art performance in RGB-T urban scene parsing

PDF Abstract