TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	box AP	42.8	# 135
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	AP50	64	# 44
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	AP75	46.4	# 51
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	APS	26.9	# 32
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	APM	46	# 43
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	APL	55	# 58
Instance Segmentation	COCO minival	Mask R-CNN (FPN, X-volution, SA)	mask AP	37.2	# 81
Instance Segmentation	COCO minival	Mask R-CNN (FPN, X-volution, SA)	APL	53.1	# 9
Instance Segmentation	COCO minival	Mask R-CNN (FPN, X-volution, SA)	APM	40	# 11
Instance Segmentation	COCO minival	Mask R-CNN (FPN, X-volution, SA)	APS	19.2	# 10
Image Classification	ImageNet	ResNet-50 (X-volution, stage3)	Top 1 Accuracy	76.6%	# 839
Image Classification	ImageNet	ResNet-50 (X-volution, stage3)	Hardware Burden	None	# 1
Image Classification	ImageNet	ResNet-50 (X-volution, stage3)	Operations per network pass	None	# 1
Image Classification	ImageNet	ResNet-34 (X-volution, stage3)	Top 1 Accuracy	75%	# 888

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-volution-on-the-unification-of-convolution/instance-segmentation-on-coco-minival)](https://paperswithcode.com/sota/instance-segmentation-on-coco-minival?p=x-volution-on-the-unification-of-convolution)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-volution-on-the-unification-of-convolution/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=x-volution-on-the-unification-of-convolution)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-volution-on-the-unification-of-convolution/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=x-volution-on-the-unification-of-convolution)`

X-volution: On the unification of convolution and self-attention

4 Jun 2021 · Xuanhong Chen, Hang Wang, Bingbing Ni ·

Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes high-order contextual relationships. Though essentially complementary to each other, i.e., first-/high-order, stat-of-the-art architectures, i.e., CNNs or transformers lack a principled way to simultaneously apply both operations in a single computational module, due to their heterogeneous computing pattern and excessive burden of global dot-product for visual tasks. In this work, we theoretically derive a global self-attention approximation scheme, which approximates a self-attention via the convolution operation on transformed features. Based on the approximated scheme, we establish a multi-branch elementary module composed of both convolution and self-attention operation, capable of unifying both local and non-local feature interaction. Importantly, once trained, this multi-branch module could be conditionally converted into a single standard convolution operation via structural re-parameterization, rendering a pure convolution styled operator named X-volution, ready to be plugged into any modern networks as an atomic operation. Extensive experiments demonstrate that the proposed X-volution, achieves highly competitive visual understanding improvements (+1.2% top-1 accuracy on ImageNet classification, +1.7 box AP and +1.5 mask AP on COCO detection and segmentation).

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Classification

Instance Segmentation

Object Detection

Datasets

ImageNet

MS COCO ImageNet-1K

JFT-300M

Results from the Paper

Edit

Ranked #81 on Instance Segmentation on COCO minival

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO minival	Faster R-CNN (FPN, X-volution)	box AP	42.8	# 135	Compare
			AP50	64	# 44	Compare
			AP75	46.4	# 51	Compare
			APS	26.9	# 32	Compare
			APM	46	# 43	Compare
			APL	55	# 58	Compare
Instance Segmentation	COCO minival	Mask R-CNN (FPN, X-volution, SA)	mask AP	37.2	# 81	Compare
			APL	53.1	# 9	Compare
			APM	40	# 11	Compare
			APS	19.2	# 10	Compare
Image Classification	ImageNet	ResNet-50 (X-volution, stage3)	Top 1 Accuracy	76.6%	# 839	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Image Classification	ImageNet	ResNet-34 (X-volution, stage3)	Top 1 Accuracy	75%	# 888	Compare

Methods

Add Remove

Convolution

Edit Social Preview

X-volution: On the unification of convolution and self-attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove