TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	IPT-B	Top 1 Accuracy	83.6%	# 378
Image Classification	ImageNet	IPT-B	Number of params	39.3M	# 672
Image Classification	ImageNet	IPT-B	GFLOPs	7.8	# 261
Image Classification	ImageNet	IPT-T	Top 1 Accuracy	80.5%	# 638
Image Classification	ImageNet	IPT-T	Number of params	14.0M	# 511
Image Classification	ImageNet	IPT-T	GFLOPs	2.3	# 156
Image Classification	ImageNet	IPT-S	Top 1 Accuracy	82.9%	# 445
Image Classification	ImageNet	IPT-S	Number of params	24.3M	# 583
Image Classification	ImageNet	IPT-S	GFLOPs	4.7	# 220

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/incepformer-efficient-inception-transformer/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=incepformer-efficient-inception-transformer)`

IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

6 Dec 2022 · Lihua Fu, Haoyue Tian, Xiangping Bryce Zhai, Pan Gao, Xiaojiang Peng ·

Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7% mIoU on ADE20K which outperforms the existing best method by 1% while only costs half parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0% mIoU on Cityscapes dataset with 39.6M parameters. Code is available:github.com/shendu0321/IncepFormer.

PDF Abstract

Code

Add Remove Mark official

shendu0321/incepformer official

Tasks

Add Remove

Image Classification

Semantic Segmentation

Datasets

ImageNet

Cityscapes

ADE20K

Results from the Paper

Edit

Ranked #378 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	IPT-B	Top 1 Accuracy	83.6%	# 378	Compare
			Number of params	39.3M	# 672	Compare
			GFLOPs	7.8	# 261	Compare
Image Classification	ImageNet	IPT-T	Top 1 Accuracy	80.5%	# 638	Compare
			Number of params	14.0M	# 511	Compare
			GFLOPs	2.3	# 156	Compare
Image Classification	ImageNet	IPT-S	Top 1 Accuracy	82.9%	# 445	Compare
			Number of params	24.3M	# 583	Compare
			GFLOPs	4.7	# 220	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove