TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	EfficientViT-B3 (r512)	Validation mIoU	49	# 132
Semantic Segmentation	Cityscapes val	EfficientViT-B3 (r1184x2368)	mIoU	83.2	# 23
Image Classification	ImageNet	EfficientViT-B3 (r288)	Top 1 Accuracy	84.2%	# 313

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficientvit-enhanced-linear-attention-for/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=efficientvit-enhanced-linear-attention-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficientvit-enhanced-linear-attention-for/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=efficientvit-enhanced-linear-attention-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficientvit-enhanced-linear-attention-for/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=efficientvit-enhanced-linear-attention-for)`

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

29 May 2022 · Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han ·

High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Unlike prior high-resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous state-of-the-art models with significant speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU. Without performance loss on Cityscapes, our EfficientViT provides up to 13.9$\times$ and 6.2$\times$ GPU latency reduction over SegFormer and SegNeXt, respectively. For super-resolution, EfficientViT delivers up to 6.4x speedup over Restormer while providing 0.11dB gain in PSNR. For Segment Anything, EfficientViT delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance on COCO.

PDF Abstract

Code

Add Remove Mark official

rwightman/pytorch-image-models official

29,671

mit-han-lab/efficientvit official

1,293

leondgarse/keras_cv_attention_models

554

2023-MindSpore-4/Code10

2023-MindSpore-4/Code1

Tasks

Add Remove

Autonomous Driving

Image Classification

Image Segmentation

Instance Segmentation

object-detection

Object Detection

Semantic Segmentation

Super-Resolution

Zero-Shot Instance Segmentation

Datasets

ImageNet

MS COCO

Cityscapes

FFHQ

ADE20K

BSD

LVIS

Results from the Paper

Edit

Ranked #23 on Semantic Segmentation on Cityscapes val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	EfficientViT-B3 (r512)	Validation mIoU	49	# 132	Compare
Semantic Segmentation	Cityscapes val	EfficientViT-B3 (r1184x2368)	mIoU	83.2	# 23	Compare
Image Classification	ImageNet	EfficientViT-B3 (r288)	Top 1 Accuracy	84.2%	# 313	Compare

Methods

Add Remove

1x1 Convolution • Absolute Position Encodings • Adam • Average Pooling • Batch Normalization • BPE • Convolution • Dense Connections • Depthwise Convolution • Depthwise Separable Convolution • Dropout • GELU • Inverted Residual Block • Label Smoothing • Layer Normalization • Linear Layer • Mix-FFN • Multi-Head Attention • Pointwise Convolution • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • RMSProp • Scaled Dot-Product Attention • SegFormer • Sigmoid Activation • Softmax • Squeeze-and-Excitation Block • Swish • Transformer

Edit Social Preview

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove