TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	HorNet-L (Mask2Former)	Validation mIoU	57.9	# 20
Object Detection	COCO minival	HorNet-L	box AP	59.2	# 28
Image Classification	ImageNet	HorNet-L (GF)	Top 1 Accuracy	87.7%	# 82
Image Classification	ImageNet	HorNet-L (GF)	GFLOPs	101.8	# 449

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hornet-efficient-high-order-spatial/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=hornet-efficient-high-order-spatial)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hornet-efficient-high-order-spatial/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=hornet-efficient-high-order-spatial)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hornet-efficient-high-order-spatial/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=hornet-efficient-high-order-spatial)`

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

28 Jul 2022 · Yongming Rao, Wenliang Zhao, Yansong Tang, Jie zhou, Ser-Nam Lim, Jiwen Lu ·

Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution ($\textit{g}^\textit{n}$Conv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. $\textit{g}^\textit{n}$Conv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and larger model sizes. Apart from the effectiveness in visual encoders, we also show $\textit{g}^\textit{n}$Conv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that $\textit{g}^\textit{n}$Conv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet

PDF Abstract

Code

Add Remove Mark official

raoyongming/hornet official

307

open-mmlab/mmclassification

3,156

towhee-io/towhee

2,987

chengtan9907/OpenSTL

↳ Quickstart in

Colab

572

Westlake-AI/openmixup

569

See all 7 implementations

Tasks

Add Remove

Image Classification

Object Detection

Semantic Segmentation

Vocal Bursts Intensity Prediction

Datasets

ImageNet

MS COCO

ADE20K

Results from the Paper

Edit

Ranked #20 on Semantic Segmentation on ADE20K

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	HorNet-L (Mask2Former)	Validation mIoU	57.9	# 20	Compare
Object Detection	COCO minival	HorNet-L	box AP	59.2	# 28	Compare
Image Classification	ImageNet	HorNet-L (GF)	Top 1 Accuracy	87.7%	# 82	Compare
Image Classification	ImageNet	HorNet-L (GF)	GFLOPs	101.8	# 449	Compare

Methods

Add Remove

1x1 Convolution • ConvNeXt • Convolution • Gated Convolution • GLU

Edit Social Preview

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove