TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	UniNet-B4	Top 1 Accuracy	84.2%	# 310
Image Classification	ImageNet	UniNet-B4	Number of params	73.5M	# 795
Image Classification	ImageNet	UniNet-B4	GFLOPs	9.9	# 296
Image Classification	ImageNet	UniNet-B2	Top 1 Accuracy	82.7%	# 465
Image Classification	ImageNet	UniNet-B2	Number of params	22.5M	# 569
Image Classification	ImageNet	UniNet-B2	GFLOPs	2.4	# 159
Image Classification	ImageNet	UniNet-B5	Top 1 Accuracy	85.2%	# 236
Image Classification	ImageNet	UniNet-B5	Number of params	73.5M	# 795
Image Classification	ImageNet	UniNet-B5	GFLOPs	23.2	# 375
Image Classification	ImageNet	UniNet-B1	Top 1 Accuracy	80.4%	# 642
Image Classification	ImageNet	UniNet-B1	Number of params	14M	# 511
Image Classification	ImageNet	UniNet-B1	GFLOPs	0.99	# 102
Image Classification	ImageNet	UniNet-B0	Top 1 Accuracy	79.1%	# 711
Image Classification	ImageNet	UniNet-B0	Number of params	11.9M	# 494
Image Classification	ImageNet	UniNet-B0	GFLOPs	0.56	# 58

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uninet-unified-architecture-search-with/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=uninet-unified-architecture-search-with)`

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

8 Oct 2021 · Jihao Liu, Hongsheng Li, Guanglu Song, Xin Huang, Yu Liu ·

Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks. A few works investigated manually combining those operators to design visual network architectures, and can achieve satisfactory performances to some extent. In this paper, we propose to jointly search the optimal combination of convolution, transformer, and MLP for building a series of all-operator network architectures with high performances on visual tasks. We empirically identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when the operators are combined to form a network. To better tackle the global context captured by the transformer and MLP operators, we propose two novel context-aware down-sampling modules, which can better adapt to the global information encoded by transformer and MLP operators. To this end, we jointly search all operators and down-sampling modules in a unified search space. Notably, Our searched network UniNet (Unified Network) outperforms state-of-the-art pure convolution-based architecture, EfficientNet, and pure transformer-based architecture, Swin-Transformer, on multiple public visual benchmarks, ImageNet classification, COCO object detection, and ADE20K semantic segmentation.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Classification

object-detection

Object Detection

Semantic Segmentation

Datasets

ImageNet

MS COCO

Results from the Paper

Edit

Ranked #236 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	UniNet-B4	Top 1 Accuracy	84.2%	# 310	Compare
			Number of params	73.5M	# 795	Compare
			GFLOPs	9.9	# 296	Compare
Image Classification	ImageNet	UniNet-B2	Top 1 Accuracy	82.7%	# 465	Compare
			Number of params	22.5M	# 569	Compare
			GFLOPs	2.4	# 159	Compare
Image Classification	ImageNet	UniNet-B5	Top 1 Accuracy	85.2%	# 236	Compare
			Number of params	73.5M	# 795	Compare
			GFLOPs	23.2	# 375	Compare
Image Classification	ImageNet	UniNet-B1	Top 1 Accuracy	80.4%	# 642	Compare
			Number of params	14M	# 511	Compare
			GFLOPs	0.99	# 102	Compare
Image Classification	ImageNet	UniNet-B0	Top 1 Accuracy	79.1%	# 711	Compare
			Number of params	11.9M	# 494	Compare
			GFLOPs	0.56	# 58	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Convolution • Dense Connections • Depthwise Convolution • Depthwise Separable Convolution • Dropout • Inverted Residual Block • Pointwise Convolution • ReLU • RMSProp • Sigmoid Activation • Squeeze-and-Excitation Block • Swish

Edit Social Preview

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove