TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	VAN-Large	Validation mIoU	48.1	# 142
Semantic Segmentation	ADE20K	VAN-Large	Params (M)	49	# 49
Semantic Segmentation	ADE20K	VAN-Tiny	Validation mIoU	38.5	# 213
Semantic Segmentation	ADE20K	VAN-Tiny	Params (M)	8	# 61
Semantic Segmentation	ADE20K	VAN-Small	Validation mIoU	42.9	# 205
Semantic Segmentation	ADE20K	VAN-Small	Params (M)	18	# 57
Semantic Segmentation	ADE20K	VAN-B6	Validation mIoU	54.7	# 49
Semantic Segmentation	ADE20K	VAN-Base (Semantic-FPN)	Validation mIoU	46.7	# 164
Semantic Segmentation	ADE20K	VAN-Large (HamNet)	Validation mIoU	50.2	# 110
Semantic Segmentation	ADE20K	VAN-Large (HamNet)	Params (M)	55	# 47
Panoptic Segmentation	COCO minival	Visual Attention Network (VAN-B6 + Mask2Former)	PQ	58.2	# 6
Panoptic Segmentation	COCO minival	Visual Attention Network (VAN-B6 + Mask2Former)	PQth	64.8	# 4
Panoptic Segmentation	COCO minival	Visual Attention Network (VAN-B6 + Mask2Former)	PQst	48.2	# 8
Panoptic Segmentation	COCO panoptic	VAN-B6*	PQ	58.2	# 1
Image Classification	ImageNet	VAN-B0	Top 1 Accuracy	75.4%	# 873
Image Classification	ImageNet	VAN-B0	Number of params	4.1M	# 381
Image Classification	ImageNet	VAN-B0	GFLOPs	0.9	# 100
Image Classification	ImageNet	VAN-B1	Top 1 Accuracy	81.1%	# 604
Image Classification	ImageNet	VAN-B1	Number of params	13.9M	# 510
Image Classification	ImageNet	VAN-B1	GFLOPs	2.5	# 161
Image Classification	ImageNet	VAN-B2	Top 1 Accuracy	82.8%	# 450
Image Classification	ImageNet	VAN-B2	Number of params	26.6M	# 613
Image Classification	ImageNet	VAN-B2	GFLOPs	5	# 231
Image Classification	ImageNet	VAN-B4 (22K, 384res)	Top 1 Accuracy	86.6%	# 130
Image Classification	ImageNet	VAN-B4 (22K, 384res)	Number of params	60M	# 764
Image Classification	ImageNet	VAN-B4 (22K, 384res)	GFLOPs	35.9	# 403
Image Classification	ImageNet	VAN-B5 (22K)	Top 1 Accuracy	86.3%	# 151
Image Classification	ImageNet	VAN-B5 (22K)	Number of params	90M	# 847
Image Classification	ImageNet	VAN-B5 (22K)	GFLOPs	17.2	# 355
Image Classification	ImageNet	VAN-B5 (22K, 384res)	Top 1 Accuracy	87%	# 110
Image Classification	ImageNet	VAN-B5 (22K, 384res)	Number of params	90M	# 847
Image Classification	ImageNet	VAN-B5 (22K, 384res)	GFLOPs	50.6	# 424
Image Classification	ImageNet	VAN-B6 (22K, 384res)	Top 1 Accuracy	87.8%	# 73
Image Classification	ImageNet	VAN-B6 (22K, 384res)	Number of params	200M	# 901
Image Classification	ImageNet	VAN-B6 (22K, 384res)	GFLOPs	114.3	# 456
Image Classification	ImageNet	VAN-B4 (22K)	Top 1 Accuracy	85.7%	# 197
Image Classification	ImageNet	VAN-B4 (22K)	Number of params	60M	# 764
Image Classification	ImageNet	VAN-B4 (22K)	GFLOPs	12.2	# 315

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-attention-network/panoptic-segmentation-on-coco-panoptic)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-panoptic?p=visual-attention-network)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-attention-network/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=visual-attention-network)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-attention-network/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=visual-attention-network)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-attention-network/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=visual-attention-network)`

Visual Attention Network

20 Feb 2022 · Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ·

While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN surpasses similar size vision transformers(ViTs) and convolutional neural networks(CNNs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation, pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark and set new state-of-the-art performance (58.2 PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4% mIoU (50.1 vs. 46.1) for semantic segmentation on ADE20K benchmark, 2.6% AP (48.8 vs. 46.2) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. Code is available at https://github.com/Visual-Attention-Network.

PDF Abstract

Code

Add Remove Mark official

Visual-Attention-Network/VAN-Classi… official

793

huggingface/transformers

124,251

facebookresearch/xformers

↳ Quickstart in

Colab

7,486

PaddlePaddle/PaddleClas

5,245

open-mmlab/mmclassification

3,128

See all 17 implementations

Tasks

Add Remove

Image Classification

Instance Segmentation

object-detection

Object Detection

Panoptic Segmentation

Pose Estimation

Segmentation

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K

PASCAL-S

DUTS

Results from the Paper

Edit

Ranked #1 on Panoptic Segmentation on COCO panoptic

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	VAN-Large	Validation mIoU	48.1	# 142	Compare
Semantic Segmentation	ADE20K	VAN-Large	Params (M)	49	# 49	Compare
Semantic Segmentation	ADE20K	VAN-Tiny	Validation mIoU	38.5	# 213	Compare
Semantic Segmentation	ADE20K	VAN-Tiny	Params (M)	8	# 61	Compare
Semantic Segmentation	ADE20K	VAN-Small	Validation mIoU	42.9	# 205	Compare
Semantic Segmentation	ADE20K	VAN-Small	Params (M)	18	# 57	Compare
Semantic Segmentation	ADE20K	VAN-B6	Validation mIoU	54.7	# 49	Compare
Semantic Segmentation	ADE20K	VAN-Base (Semantic-FPN)	Validation mIoU	46.7	# 164	Compare
Semantic Segmentation	ADE20K	VAN-Large (HamNet)	Validation mIoU	50.2	# 110	Compare
Semantic Segmentation	ADE20K	VAN-Large (HamNet)	Params (M)	55	# 47	Compare
Panoptic Segmentation	COCO minival	Visual Attention Network (VAN-B6 + Mask2Former)	PQ	58.2	# 6	Compare
			PQth	64.8	# 4	Compare
			PQst	48.2	# 8	Compare
Panoptic Segmentation	COCO panoptic	VAN-B6*	PQ	58.2	# 1	Compare
Image Classification	ImageNet	VAN-B0	Top 1 Accuracy	75.4%	# 873	Compare
			Number of params	4.1M	# 381	Compare
			GFLOPs	0.9	# 100	Compare
Image Classification	ImageNet	VAN-B1	Top 1 Accuracy	81.1%	# 604	Compare
			Number of params	13.9M	# 510	Compare
			GFLOPs	2.5	# 161	Compare
Image Classification	ImageNet	VAN-B2	Top 1 Accuracy	82.8%	# 450	Compare
			Number of params	26.6M	# 613	Compare
			GFLOPs	5	# 231	Compare
Image Classification	ImageNet	VAN-B4 (22K, 384res)	Top 1 Accuracy	86.6%	# 130	Compare
			Number of params	60M	# 764	Compare
			GFLOPs	35.9	# 403	Compare
Image Classification	ImageNet	VAN-B5 (22K)	Top 1 Accuracy	86.3%	# 151	Compare
			Number of params	90M	# 847	Compare
			GFLOPs	17.2	# 355	Compare
Image Classification	ImageNet	VAN-B5 (22K, 384res)	Top 1 Accuracy	87%	# 110	Compare
			Number of params	90M	# 847	Compare
			GFLOPs	50.6	# 424	Compare
Image Classification	ImageNet	VAN-B6 (22K, 384res)	Top 1 Accuracy	87.8%	# 73	Compare
			Number of params	200M	# 901	Compare
			GFLOPs	114.3	# 456	Compare
Image Classification	ImageNet	VAN-B4 (22K)	Top 1 Accuracy	85.7%	# 197	Compare
			Number of params	60M	# 764	Compare
			GFLOPs	12.2	# 315	Compare

Methods

Add Remove

Visual Attention

Edit Social Preview

Visual Attention Network

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove