TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO test-dev	EdgeNeXt	box mAP	27.9	# 229
Object Detection	COCO test-dev	EdgeNeXt	Params (M)	6.2	# 9
Image Classification	ImageNet	EdgeNeXt-S	Top 1 Accuracy	79.4%	# 695
Image Classification	ImageNet	EdgeNeXt-S	Number of params	5.6M	# 425
Image Classification	ImageNet	EdgeNeXt-S	GFLOPs	2.6	# 164
Image Classification	ImageNet	EdgeNeXt-XXS	Top 1 Accuracy	71.2%	# 940
Image Classification	ImageNet	EdgeNeXt-XXS	Number of params	1.3M	# 351
Image Classification	ImageNet	EdgeNeXt-XXS	GFLOPs	0.522	# 54
Semantic Segmentation	PASCAL VOC 2012 test	EdgeNeXt	Mean IoU	80.2%	# 29
Semantic Segmentation	PASCAL VOC 2012 test	EdgeNeXt	FLOPS	8.7G	# 1
Semantic Segmentation	PASCAL VOC 2012 test	EdgeNeXt	Params	6.5M	# 51

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/edgenext-efficiently-amalgamated-cnn/semantic-segmentation-on-pascal-voc-2012)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-voc-2012?p=edgenext-efficiently-amalgamated-cnn)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/edgenext-efficiently-amalgamated-cnn/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=edgenext-efficiently-amalgamated-cnn)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/edgenext-efficiently-amalgamated-cnn/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=edgenext-efficiently-amalgamated-cnn)`

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

21 Jun 2022 · Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan ·

In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. It is of great interest to build resource-efficient general purpose networks due to their usefulness in several application areas. In this work, we strive to effectively combine the strengths of both CNN and Transformer models and propose a new efficient hybrid architecture EdgeNeXt. Specifically in EdgeNeXt, we introduce split depth-wise transpose attention (STDA) encoder that splits input tensors into multiple channel groups and utilizes depth-wise convolution along with self-attention across channel dimensions to implicitly increase the receptive field and encode multi-scale features. Our extensive experiments on classification, detection and segmentation tasks, reveal the merits of the proposed approach, outperforming state-of-the-art methods with comparatively lower compute requirements. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2.2% with 28% reduction in FLOPs. Further, our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K. The code and models are available at https://t.ly/_Vu9.

PDF Abstract

Code

Add Remove Mark official

mmaaz60/EdgeNeXt official

327

rwightman/pytorch-image-models

29,713

leondgarse/keras_cv_attention_models

554

amshaker/swiftformer

196

alibaba-miil/solving_imagenet

190

See all 7 implementations

Tasks

Add Remove

Image Classification

Object Detection

Semantic Segmentation

Datasets

ImageNet

MS COCO

ssd ImageNet-1K PASCAL VOC 2012 test

Results from the Paper

Edit

Ranked #29 on Semantic Segmentation on PASCAL VOC 2012 test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO test-dev	EdgeNeXt	box mAP	27.9	# 229	Compare
Object Detection	COCO test-dev	EdgeNeXt	Params (M)	6.2	# 9	Compare
Image Classification	ImageNet	EdgeNeXt-S	Top 1 Accuracy	79.4%	# 695	Compare
			Number of params	5.6M	# 425	Compare
			GFLOPs	2.6	# 164	Compare
Image Classification	ImageNet	EdgeNeXt-XXS	Top 1 Accuracy	71.2%	# 940	Compare
			Number of params	1.3M	# 351	Compare
			GFLOPs	0.522	# 54	Compare
Semantic Segmentation	PASCAL VOC 2012 test	EdgeNeXt	Mean IoU	80.2%	# 29	Compare
			FLOPS	8.7G	# 1	Compare
			Params	6.5M	# 51	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • MobileViT • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove