TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	Visformer-S	Top 1 Accuracy	82.2%	# 510
Image Classification	ImageNet	Visformer-S	Number of params	40.2M	# 684
Image Classification	ImageNet	Visformer-S	GFLOPs	4.9	# 230
Image Classification	ImageNet	Visformer-Ti	Top 1 Accuracy	78.6%	# 753
Image Classification	ImageNet	Visformer-Ti	Number of params	10.3M	# 478
Image Classification	ImageNet	Visformer-Ti	GFLOPs	1.3	# 118

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visformer-the-vision-friendly-transformer/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=visformer-the-vision-friendly-transformer)`

Visformer: The Vision-friendly Transformer

ICCV 2021 · Zhengsu Chen, Lingxi Xie, Jianwei Niu, Xuefeng Liu, Longhui Wei, Qi Tian ·

The past year has witnessed the rapid development of applying the Transformer module to vision problems. While some researchers have demonstrated that Transformer-based models enjoy a favorable ability of fitting data, there are still growing number of evidences showing that these models suffer over-fitting especially when the training data is limited. This paper offers an empirical study by performing step-by-step operations to gradually transit a Transformer-based model to a convolution-based model. The results we obtain during the transition process deliver useful messages for improving visual recognition. Based on these observations, we propose a new architecture named Visformer, which is abbreviated from the `Vision-friendly Transformer'. With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy, and the advantage becomes more significant when the model complexity is lower or the training set is smaller. The code is available at https://github.com/danczs/Visformer.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

danczs/Visformer official

131

rwightman/pytorch-image-models

29,671

SforAiDl/vformer

161

MS-Mind/MS-Code-02

Mind23-2/MindCode-120

Tasks

Add Remove

Image Classification

Datasets

ImageNet

Results from the Paper

Edit

Ranked #507 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	Visformer-S	Top 1 Accuracy	82.2%	# 510	Compare
			Number of params	40.2M	# 684	Compare
			GFLOPs	4.9	# 230	Compare
Image Classification	ImageNet	Visformer-Ti	Top 1 Accuracy	78.6%	# 753	Compare
			Number of params	10.3M	# 478	Compare
			GFLOPs	1.3	# 118	Compare

Methods

Add Remove

1x1 Convolution • Absolute Position Encodings • Adam • Batch Normalization • Bottleneck Residual Block • BPE • Convolution • Dense Connections • Dropout • Grouped Convolution • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Visformer

Edit Social Preview

Visformer: The Vision-friendly Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove