TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Semantic Segmentation	Multispectral Video Semantic Segmentation	MVNet(DeepLabV3)	mIoU	54.52	# 1
Video Semantic Segmentation	Multispectral Video Semantic Segmentation	MVNet(FCN)	mIoU	53.90	# 3
Video Semantic Segmentation	Multispectral Video Semantic Segmentation	MVNet(PSPNet)	mIoU	54.36	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multispectral-video-semantic-segmentation-a/video-semantic-segmentation-on-multispectral)](https://paperswithcode.com/sota/video-semantic-segmentation-on-multispectral?p=multispectral-video-semantic-segmentation-a)`

Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline

CVPR 2023 · Wei Ji, Jingjing Li, Cheng Bian, Zongwei Zhou, Jiaying Zhao, Alan L. Yuille, Li Cheng ·

Robust and reliable semantic segmentation in complex scenes is crucial for many real-life applications such as autonomous safe driving and nighttime rescue. In most approaches, it is typical to make use of RGB images as input. They however work well only in preferred weather conditions; when facing adverse conditions such as rainy, overexposure, or low-light, they often fail to deliver satisfactory results. This has led to the recent investigation into multispectral semantic segmentation, where RGB and thermal infrared (RGBT) images are both utilized as input. This gives rise to significantly more robust segmentation of image objects in complex scenes and under adverse conditions. Nevertheless, the present focus in single RGBT image input restricts existing methods from well addressing dynamic real-world scenes. Motivated by the above observations, in this paper, we set out to address a relatively new task of semantic segmentation of multispectral video input, which we refer to as Multispectral Video Semantic Segmentation, or MVSS in short. An in-house MVSeg dataset is thus curated, consisting of 738 calibrated RGB and thermal videos, accompanied by 3,545 fine-grained pixel-level semantic annotations of 26 categories. Our dataset contains a wide range of challenging urban scenes in both daytime and nighttime. Moreover, we propose an effective MVSS baseline, dubbed MVNet, which is to our knowledge the first model to jointly learn semantic representations from multispectral and temporal contexts. Comprehensive experiments are conducted using various semantic segmentation models on the MVSeg dataset. Empirically, the engagement of multispectral video input is shown to lead to significant improvement in semantic segmentation; the effectiveness of our MVNet baseline has also been verified.

PDF Abstract

Code

Add Remove Mark official

jiwei0921/MVSS-Baseline official

Tasks

Add Remove

Segmentation

Semantic Segmentation

Video Semantic Segmentation

Datasets

Cityscapes

CamVid VSPW

KAIST Multispectral Pedestrian Detection Benchmark

Results from the Paper

Add Remove

Ranked #1 on Video Semantic Segmentation on Multispectral Video Semantic Segmentation

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Semantic Segmentation	Multispectral Video Semantic Segmentation	MVNet(DeepLabV3)	mIoU	54.52	# 1	Compare
Video Semantic Segmentation	Multispectral Video Semantic Segmentation	MVNet(FCN)	mIoU	53.90	# 3	Compare
Video Semantic Segmentation	Multispectral Video Semantic Segmentation	MVNet(PSPNet)	mIoU	54.36	# 2	Compare

Methods

Add Remove

fail

Edit Social Preview

Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove