TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-10	ASF-former-B	Accuracy	98.8%	# 2
Image Classification	CIFAR-10	ASF-former-S	Accuracy	98.7%	# 3
Image Classification	CIFAR-10 Image Classification	ASF-former-S	Params	19.3M	# 1
Image Classification	CIFAR-10 Image Classification	ASF-former-B	Params	56.7M	# 2
Image Classification	ImageNet	ASF-former-S	Top 1 Accuracy	82.7%	# 465
Image Classification	ImageNet	ASF-former-S	Number of params	19.3M	# 534
Image Classification	ImageNet	ASF-former-B	Top 1 Accuracy	83.9%	# 347
Image Classification	ImageNet	ASF-former-B	Number of params	56.7M	# 754

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adaptive-split-fusion-transformer/image-classification-on-cifar-10-image)](https://paperswithcode.com/sota/image-classification-on-cifar-10-image?p=adaptive-split-fusion-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adaptive-split-fusion-transformer/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=adaptive-split-fusion-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adaptive-split-fusion-transformer/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=adaptive-split-fusion-transformer)`

Adaptive Split-Fusion Transformer

26 Apr 2022 · Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang ·

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models to best utilize each technique. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention, without concerning the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former encoder equally splits feature channels into half to fit dual-path inputs. Then, the outputs of dual-path are fused with weighting scalars calculated from visual cues. We also design the convolutional path compactly for efficiency concerns. Extensive experiments on standard benchmarks, such as ImageNet-1K, CIFAR-10, and CIFAR-100, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy (83.9% on ImageNet-1K), under similar conditions (12.9G MACs/56.7M Params, without large-scale pre-training). The code is available at: https://github.com/szx503045266/ASF-former.

PDF Abstract

Code

Add Remove Mark official

szx503045266/asf-former official

Tasks

Add Remove

Image Classification

Datasets

CIFAR-10

ImageNet

CIFAR-100

Results from the Paper

Edit

Ranked #1 on Image Classification on CIFAR-10 Image Classification

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-10	ASF-former-B	Accuracy	98.8%	# 2	Compare
Image Classification	CIFAR-10	ASF-former-S	Accuracy	98.7%	# 3	Compare
Image Classification	CIFAR-10 Image Classification	ASF-former-S	Params	19.3M	# 1	Compare
Image Classification	CIFAR-10 Image Classification	ASF-former-B	Params	56.7M	# 2	Compare
Image Classification	ImageNet	ASF-former-S	Top 1 Accuracy	82.7%	# 465	Compare
Image Classification	ImageNet	ASF-former-S	Number of params	19.3M	# 534	Compare
Image Classification	ImageNet	ASF-former-B	Top 1 Accuracy	83.9%	# 347	Compare
Image Classification	ImageNet	ASF-former-B	Number of params	56.7M	# 754	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Adaptive Split-Fusion Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove