TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Scene Parsing	Cityscapes test	VCD No Coarse	mIoU	82.3	# 1
Semantic Segmentation	GAMUS	VCD	mIoU	59.70	# 3
Semantic Segmentation	NYU Depth v2	VCD+DeepLab (VGG16)	Mean IoU	45.3	# 80
Semantic Segmentation	NYU Depth v2	VCD+RedNet (ResNet-50)	Mean IoU	50.7%	# 47
Semantic Segmentation	NYU Depth v2	VCD+ACNet (ResNet-50)	Mean IoU	51.9%	# 36

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/variational-context-deformable-convnets-for/scene-parsing-on-cityscapes-test)](https://paperswithcode.com/sota/scene-parsing-on-cityscapes-test?p=variational-context-deformable-convnets-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/variational-context-deformable-convnets-for/semantic-segmentation-on-gamus)](https://paperswithcode.com/sota/semantic-segmentation-on-gamus?p=variational-context-deformable-convnets-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/variational-context-deformable-convnets-for/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=variational-context-deformable-convnets-for)`

Variational Context-Deformable ConvNets for Indoor Scene Parsing

CVPR 2020 · Zhitong Xiong, Yuan Yuan, Nianhui Guo, Qi Wang ·

Context information is critical for image semantic segmentation. Especially in indoor scenes, the large variation of object scales makes spatial-context an important factor for improving the segmentation performance. Thus, in this paper, we propose a novel variational context-deformable (VCD) module to learn adaptive receptive-field in a structured fashion. Different from standard ConvNets, which share fixed-size spatial context for all pixels, the VCD module learns a deformable spatial-context with the guidance of depth information: depth information provides clues for identifying real local neighborhoods. Specifically, adaptive Gaussian kernels are learned with the guidance of multimodal information. By multiplying the learned Gaussian kernel with standard convolution filters, the VCD module can aggregate flexible spatial context for each pixel during convolution. The main contributions of this work are as follows: 1) a novel VCD module is proposed, which exploits learnable Gaussian kernels to enable feature learning with structured adaptive-context; 2) variational Bayesian probabilistic modeling is introduced for the training of VCD module, which can make it continuous and more stable; 3) a perspective-aware guidance module is designed to take advantage of multi-modal information for RGB-D segmentation. We evaluate the proposed approach on three widely-used datasets, and the performance improvement has shown the effectiveness of the proposed method.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Scene Parsing

Segmentation

Semantic Segmentation

Datasets

Cityscapes

NYUv2

SUN RGB-D

Results from the Paper

Add Remove

Ranked #1 on Scene Parsing on Cityscapes test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Scene Parsing	Cityscapes test	VCD No Coarse	mIoU	82.3	# 1	Compare
Semantic Segmentation	GAMUS	VCD	mIoU	59.70	# 3	Compare
Semantic Segmentation	NYU Depth v2	VCD+DeepLab (VGG16)	Mean IoU	45.3	# 80	Compare
Semantic Segmentation	NYU Depth v2	VCD+RedNet (ResNet-50)	Mean IoU	50.7%	# 47	Compare
Semantic Segmentation	NYU Depth v2	VCD+ACNet (ResNet-50)	Mean IoU	51.9%	# 36	Compare

Methods

Add Remove

Convolution

Edit Social Preview

Variational Context-Deformable ConvNets for Indoor Scene Parsing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove