TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	ConvNeXt-S	Validation mIoU	49.6	# 122
Semantic Segmentation	ADE20K	ConvNeXt-S	Params (M)	82	# 33
Semantic Segmentation	ADE20K	ConvNeXt-S	GFLOPs (512 x 512)	1027	# 16
Semantic Segmentation	ADE20K	ConvNeXt-B++	Validation mIoU	53.1	# 77
Semantic Segmentation	ADE20K	ConvNeXt-B++	Params (M)	122	# 25
Semantic Segmentation	ADE20K	ConvNeXt-B++	GFLOPs (512 x 512)	1828	# 23
Semantic Segmentation	ADE20K	ConvNeXt-L++	Validation mIoU	53.7	# 68
Semantic Segmentation	ADE20K	ConvNeXt-L++	Params (M)	235	# 16
Semantic Segmentation	ADE20K	ConvNeXt-L++	GFLOPs (512 x 512)	2458	# 24
Semantic Segmentation	ADE20K	ConvNeXt-XL++	Validation mIoU	54	# 63
Semantic Segmentation	ADE20K	ConvNeXt-XL++	Params (M)	391	# 12
Semantic Segmentation	ADE20K	ConvNeXt-XL++	GFLOPs (512 x 512)	3335	# 25
Semantic Segmentation	ADE20K	ConvNeXt-B	Validation mIoU	49.9	# 116
Semantic Segmentation	ADE20K	ConvNeXt-B	Params (M)	122	# 25
Semantic Segmentation	ADE20K	ConvNeXt-B	GFLOPs (512 x 512)	1170	# 20
Semantic Segmentation	ADE20K	ConvNeXt-T	Validation mIoU	46.7	# 164
Semantic Segmentation	ADE20K	ConvNeXt-T	Params (M)	60	# 41
Semantic Segmentation	ADE20K	ConvNeXt-T	GFLOPs (512 x 512)	939	# 12
Object Detection	COCO-O	ConvNeXt-XL (Cascade Mask R-CNN)	Average mAP	37.5	# 7
Object Detection	COCO-O	ConvNeXt-XL (Cascade Mask R-CNN)	Effective Robustness	12.68	# 6
Image Classification	ImageNet	ConvNeXt-L (384 res)	Top 1 Accuracy	85.5%	# 212
Image Classification	ImageNet	ConvNeXt-L (384 res)	Number of params	198M	# 899
Image Classification	ImageNet	ConvNeXt-L (384 res)	GFLOPs	101	# 447
Image Classification	ImageNet	ConvNeXt-XL (ImageNet-22k)	Top 1 Accuracy	87.8%	# 75
Image Classification	ImageNet	ConvNeXt-XL (ImageNet-22k)	Number of params	350M	# 924
Image Classification	ImageNet	ConvNeXt-XL (ImageNet-22k)	GFLOPs	179	# 466
Image Classification	ImageNet	Adlik-ViT-SG+Swin_large+Convnext_xlarge(384)	Top 1 Accuracy	88.36%	# 59
Image Classification	ImageNet	Adlik-ViT-SG+Swin_large+Convnext_xlarge(384)	Number of params	1827M	# 961
Image Classification	ImageNet	ConvNeXt-T	Top 1 Accuracy	82.1%	# 525
Image Classification	ImageNet	ConvNeXt-T	Number of params	29M	# 641
Image Classification	ImageNet	ConvNeXt-T	GFLOPs	4.5	# 211
Domain Generalization	ImageNet-A	ConvNeXt-XL (Im21k, 384)	Top-1 accuracy %	69.3	# 10
Domain Generalization	ImageNet-C	ConvNeXt-XL (Im21k) (augmentation overlap with ImageNet-C)	mean Corruption Error (mCE)	38.8	# 12
Domain Generalization	ImageNet-C	ConvNeXt-XL (Im21k) (augmentation overlap with ImageNet-C)	Number of params	350M	# 40
Domain Generalization	ImageNet-R	ConvNeXt-XL (Im21k, 384)	Top-1 Error Rate	31.8	# 8
Semantic Segmentation	ImageNet-S	ConvNext-Tiny (P4, 224x224, SUP)	mIoU (val)	48.7	# 11
Semantic Segmentation	ImageNet-S	ConvNext-Tiny (P4, 224x224, SUP)	mIoU (test)	48.8	# 10
Domain Generalization	ImageNet-Sketch	ConvNeXt-XL (Im21k, 384)	Top-1 accuracy	55.0	# 4
Classification	InDL	ConvNext	Average Recall	93.47%	# 1
Domain Generalization	VizWiz-Classification	ConvNeXt-B	Accuracy - All Images	53.5	# 2
Domain Generalization	VizWiz-Classification	ConvNeXt-B	Accuracy - Corrupted Images	46.9	# 3
Domain Generalization	VizWiz-Classification	ConvNeXt-B	Accuracy - Clean Images	56	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/classification-on-indl)](https://paperswithcode.com/sota/classification-on-indl?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/domain-generalization-on-vizwiz)](https://paperswithcode.com/sota/domain-generalization-on-vizwiz?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/domain-generalization-on-imagenet-sketch)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-sketch?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/object-detection-on-coco-o)](https://paperswithcode.com/sota/object-detection-on-coco-o?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/domain-generalization-on-imagenet-r)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-r?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/domain-generalization-on-imagenet-a)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-a?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/semantic-segmentation-on-imagenet-s)](https://paperswithcode.com/sota/semantic-segmentation-on-imagenet-s?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/domain-generalization-on-imagenet-c)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-c?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=a-convnet-for-the-2020s)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-convnet-for-the-2020s/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=a-convnet-for-the-2020s)`

A ConvNet for the 2020s

CVPR 2022 · Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ·

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

facebookresearch/ConvNeXt official

↳ Quickstart in

Colab

Spaces

5,533

keras-team/keras

60,884

rwightman/pytorch-image-models

29,774

pytorch/vision

15,445

lucidrains/denoising-diffusion-pyto…

7,008

See all 45 implementations

Tasks

Add Remove

Classification

Domain Generalization

Image Classification

Object Detection

Real-Time Object Detection

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K ImageNet-1K

ImageNet-C

ImageNet-R

ImageNet-A

ImageNet-Sketch

COCO-O

ImageNet-S

VizWiz-Classification

InDL

Results from the Paper

Edit

Ranked #1 on Classification on InDL

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	ConvNeXt-S	Validation mIoU	49.6	# 122	Compare
			Params (M)	82	# 33	Compare
			GFLOPs (512 x 512)	1027	# 16	Compare
Semantic Segmentation	ADE20K	ConvNeXt-B++	Validation mIoU	53.1	# 77	Compare
			Params (M)	122	# 25	Compare
			GFLOPs (512 x 512)	1828	# 23	Compare
Semantic Segmentation	ADE20K	ConvNeXt-L++	Validation mIoU	53.7	# 68	Compare
			Params (M)	235	# 16	Compare
			GFLOPs (512 x 512)	2458	# 24	Compare
Semantic Segmentation	ADE20K	ConvNeXt-XL++	Validation mIoU	54	# 63	Compare
			Params (M)	391	# 12	Compare
			GFLOPs (512 x 512)	3335	# 25	Compare
Semantic Segmentation	ADE20K	ConvNeXt-B	Validation mIoU	49.9	# 116	Compare
			Params (M)	122	# 25	Compare
			GFLOPs (512 x 512)	1170	# 20	Compare
Semantic Segmentation	ADE20K	ConvNeXt-T	Validation mIoU	46.7	# 164	Compare
			Params (M)	60	# 41	Compare
			GFLOPs (512 x 512)	939	# 12	Compare
Object Detection	COCO-O	ConvNeXt-XL (Cascade Mask R-CNN)	Average mAP	37.5	# 7	Compare
Object Detection	COCO-O	ConvNeXt-XL (Cascade Mask R-CNN)	Effective Robustness	12.68	# 6	Compare
Image Classification	ImageNet	ConvNeXt-L (384 res)	Top 1 Accuracy	85.5%	# 212	Compare
			Number of params	198M	# 899	Compare
			GFLOPs	101	# 447	Compare
Image Classification	ImageNet	ConvNeXt-XL (ImageNet-22k)	Top 1 Accuracy	87.8%	# 75	Compare
			Number of params	350M	# 924	Compare
			GFLOPs	179	# 466	Compare
Image Classification	ImageNet	Adlik-ViT-SG+Swin_large+Convnext_xlarge(384)	Top 1 Accuracy	88.36%	# 59	Compare
Image Classification	ImageNet	Adlik-ViT-SG+Swin_large+Convnext_xlarge(384)	Number of params	1827M	# 961	Compare
Image Classification	ImageNet	ConvNeXt-T	Top 1 Accuracy	82.1%	# 525	Compare
			Number of params	29M	# 641	Compare
			GFLOPs	4.5	# 211	Compare
Domain Generalization	ImageNet-A	ConvNeXt-XL (Im21k, 384)	Top-1 accuracy %	69.3	# 10	Compare
Domain Generalization	ImageNet-C	ConvNeXt-XL (Im21k) (augmentation overlap with ImageNet-C)	mean Corruption Error (mCE)	38.8	# 12	Compare
Domain Generalization	ImageNet-C	ConvNeXt-XL (Im21k) (augmentation overlap with ImageNet-C)	Number of params	350M	# 40	Compare
Domain Generalization	ImageNet-R	ConvNeXt-XL (Im21k, 384)	Top-1 Error Rate	31.8	# 8	Compare
Semantic Segmentation	ImageNet-S	ConvNext-Tiny (P4, 224x224, SUP)	mIoU (val)	48.7	# 11	Compare
Semantic Segmentation	ImageNet-S	ConvNext-Tiny (P4, 224x224, SUP)	mIoU (test)	48.8	# 10	Compare
Domain Generalization	ImageNet-Sketch	ConvNeXt-XL (Im21k, 384)	Top-1 accuracy	55.0	# 4	Compare
Classification	InDL	ConvNext	Average Recall	93.47%	# 1	Compare
Domain Generalization	VizWiz-Classification	ConvNeXt-B	Accuracy - All Images	53.5	# 2	Compare
			Accuracy - Corrupted Images	46.9	# 3	Compare
			Accuracy - Clean Images	56	# 2	Compare

Methods

Add Remove

1x1 Convolution • ConvNeXt • LayerScale

Edit Social Preview

A ConvNet for the 2020s

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove