TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multispectral Object Detection	FLIR	YOLOv5 (RGB)	mAP50	67.8%	# 8
Multispectral Object Detection	FLIR	YOLOv5 (T)	mAP50	73.9%	# 3
Multispectral Object Detection	FLIR	CFT	mAP50	77.7%	# 2
Multispectral Object Detection	LLVIP	CFT	mAP50	97.5	# 1
Pedestrian Detection	LLVIP	CFT	AP	0.636	# 1
Pedestrian Detection	LLVIP	CFT	log average miss rate	5.40%	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modality-fusion-transformer-for/multispectral-object-detection-on-llvip)](https://paperswithcode.com/sota/multispectral-object-detection-on-llvip?p=cross-modality-fusion-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modality-fusion-transformer-for/pedestrian-detection-on-llvip)](https://paperswithcode.com/sota/pedestrian-detection-on-llvip?p=cross-modality-fusion-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modality-fusion-transformer-for/multispectral-object-detection-on-flir-1)](https://paperswithcode.com/sota/multispectral-object-detection-on-flir-1?p=cross-modality-fusion-transformer-for)`

Cross-Modality Fusion Transformer for Multispectral Object Detection

30 Oct 2021 · Fang Qingyun, Han Dapeng, Wang Zhaokui ·

Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust in the open world. To fully exploit the different modalities, we present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper. Unlike prior CNNs-based works, guided by the transformer scheme, our network learns long-range dependencies and integrates global contextual information in the feature extraction stage. More importantly, by leveraging the self attention of the transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral object detection. Extensive experiments and ablation studies on multiple datasets demonstrate that our approach is effective and achieves state-of-the-art detection performance. Our code and models are available at https://github.com/DocF/multispectral-object-detection.

PDF Abstract

Code

Add Remove Mark official

docf/multispectral-object-detection official

250

Tasks

Add Remove

Multispectral Object Detection

Object

object-detection

Object Detection

Pedestrian Detection

Datasets

MS COCO

LLVIP FLIR-unaligned

Results from the Paper

Edit

Ranked #1 on Multispectral Object Detection on LLVIP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multispectral Object Detection	FLIR	YOLOv5 (RGB)	mAP50	67.8%	# 8	Compare
Multispectral Object Detection	FLIR	YOLOv5 (T)	mAP50	73.9%	# 3	Compare
Multispectral Object Detection	FLIR	CFT	mAP50	77.7%	# 2	Compare
Multispectral Object Detection	LLVIP	CFT	mAP50	97.5	# 1	Compare
Pedestrian Detection	LLVIP	CFT	AP	0.636	# 1	Compare
Pedestrian Detection	LLVIP	CFT	log average miss rate	5.40%	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Cross-Modality Fusion Transformer for Multispectral Object Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove