TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-10	AutoFormer-S \| 384	Percentage correct	99.1	# 12
Image Classification	CIFAR-10	AutoFormer-S \| 384	PARAMS	23M	# 208
Image Classification	ImageNet	AutoFormer-base	Top 1 Accuracy	82.4%	# 491
Image Classification	ImageNet	AutoFormer-base	Number of params	54M	# 737
Image Classification	ImageNet	AutoFormer-base	GFLOPs	11	# 305
Image Classification	ImageNet	AutoFormer-small	Top 1 Accuracy	81.7%	# 563
Image Classification	ImageNet	AutoFormer-small	Number of params	22.9M	# 573
Image Classification	ImageNet	AutoFormer-small	GFLOPs	5.1	# 234
Image Classification	ImageNet	AutoFormer-tiny	Top 1 Accuracy	74.7%	# 898
Image Classification	ImageNet	AutoFormer-tiny	Number of params	5.7M	# 428
Image Classification	ImageNet	AutoFormer-tiny	GFLOPs	1.3	# 118
Fine-Grained Image Classification	Oxford 102 Flowers	AutoFormer-S \| 384	Top 1 Accuracy	98.8	# 1
Fine-Grained Image Classification	Oxford-IIIT Pet Dataset	AutoFormer-S \| 384	Accuracy	94.9%	# 10
Fine-Grained Image Classification	Stanford Cars	AutoFormer-S \| 384	Accuracy	93.4%	# 55

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/autoformer-searching-transformers-for-visual/fine-grained-image-classification-on-oxford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-oxford?p=autoformer-searching-transformers-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/autoformer-searching-transformers-for-visual/fine-grained-image-classification-on-oxford-1)](https://paperswithcode.com/sota/fine-grained-image-classification-on-oxford-1?p=autoformer-searching-transformers-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/autoformer-searching-transformers-for-visual/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=autoformer-searching-transformers-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/autoformer-searching-transformers-for-visual/fine-grained-image-classification-on-stanford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford?p=autoformer-searching-transformers-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/autoformer-searching-transformers-for-visual/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=autoformer-searching-transformers-for-visual)`

AutoFormer: Searching Transformers for Visual Recognition

ICCV 2021 · Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling ·

Recently, pure transformer-based models have shown great potentials for vision tasks such as image classification and detection. However, the design of transformer networks is challenging. It has been observed that the depth, embedding dimension, and number of heads can largely affect the performance of vision transformers. Previous models configure these dimensions based upon manual crafting. In this work, we propose a new one-shot architecture search framework, namely AutoFormer, dedicated to vision transformer search. AutoFormer entangles the weights of different blocks in the same layers during supernet training. Benefiting from the strategy, the trained supernet allows thousands of subnets to be very well-trained. Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch. Besides, the searched models, which we refer to AutoFormers, surpass the recent state-of-the-arts such as ViT and DeiT. In particular, AutoFormer-tiny/small/base achieve 74.7%/81.7%/82.4% top-1 accuracy on ImageNet with 5.7M/22.9M/53.7M parameters, respectively. Lastly, we verify the transferability of AutoFormer by providing the performance on downstream benchmarks and distillation experiments. Code and models are available at https://github.com/microsoft/AutoML.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

microsoft/AutoML official

1,562

microsoft/cream

1,562

Tasks

Add Remove

AutoML

Fine-Grained Image Classification

Image Classification

Datasets

CIFAR-10

ImageNet

CIFAR-100

Oxford 102 Flower

Stanford Cars

Oxford-IIIT Pet Dataset Oxford-IIIT Pets

Results from the Paper

Edit

Ranked #1 on Fine-Grained Image Classification on Oxford 102 Flowers (Top 1 Accuracy metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-10	AutoFormer-S \| 384	Percentage correct	99.1	# 12	Compare
Image Classification	CIFAR-10	AutoFormer-S \| 384	PARAMS	23M	# 208	Compare
Image Classification	ImageNet	AutoFormer-base	Top 1 Accuracy	82.4%	# 491	Compare
			Number of params	54M	# 737	Compare
			GFLOPs	11	# 305	Compare
Image Classification	ImageNet	AutoFormer-small	Top 1 Accuracy	81.7%	# 563	Compare
			Number of params	22.9M	# 573	Compare
			GFLOPs	5.1	# 234	Compare
Image Classification	ImageNet	AutoFormer-tiny	Top 1 Accuracy	74.7%	# 898	Compare
			Number of params	5.7M	# 428	Compare
			GFLOPs	1.3	# 118	Compare
Fine-Grained Image Classification	Oxford 102 Flowers	AutoFormer-S \| 384	Top 1 Accuracy	98.8	# 1	Compare
Fine-Grained Image Classification	Oxford-IIIT Pet Dataset	AutoFormer-S \| 384	Accuracy	94.9%	# 10	Compare
Fine-Grained Image Classification	Stanford Cars	AutoFormer-S \| 384	Accuracy	93.4%	# 55	Compare

Methods

Add Remove

Attention Dropout • DeiT • Dense Connections • Dropout • Feedforward Network • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

AutoFormer: Searching Transformers for Visual Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove