TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-10	Convolutional Performer for Vision (CPV)	Percentage correct	94.46	# 142
Image Classification	CIFAR-10	Convolutional Performer for Vision (CPV)	PARAMS	1.3M	# 185
Image Classification	CIFAR-100	Convolutional Linear Transformer for Vision (CLTV)	Percentage correct	60.11	# 187
Image Classification	Tiny ImageNet Classification	Convolutional Nystromformer for Vision (CNV)	Validation Acc	49.56	# 22

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/convolutional-xformers-for-vision/image-classification-on-tiny-imagenet-1)](https://paperswithcode.com/sota/image-classification-on-tiny-imagenet-1?p=convolutional-xformers-for-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/convolutional-xformers-for-vision/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=convolutional-xformers-for-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/convolutional-xformers-for-vision/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=convolutional-xformers-for-vision)`

Convolutional Xformers for Vision

25 Jan 2022 · Pranav Jeevan, Amit Sethi ·

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and more computational resources compared to convolutional neural networks (CNNs), owing to the quadratic complexity of their self-attention mechanism. We propose a linear attention-convolution hybrid architecture -- Convolutional X-formers for Vision (CXV) -- to overcome these limitations. We replace the quadratic attention with linear attention mechanisms, such as Performer, Nystr\"omformer, and Linear Transformer, to reduce its GPU usage. Inductive prior for image data is provided by convolutional sub-layers, thereby eliminating the need for class token and positional embeddings used by the ViTs. We also propose a new training method where we use two different optimizers during different phases of training and show that it improves the top-1 image classification accuracy across different architectures. CXV outperforms other architectures, token mixers (e.g. ConvMixer, FNet and MLP Mixer), transformer models (e.g. ViT, CCT, CvT and hybrid Xformers), and ResNets for image classification in scenarios with limited data and GPU resources (cores, RAM, power).

PDF Abstract

Code

Add Remove Mark official

pranavphoenix/cxv official

Tasks

Add Remove

Image Classification

Datasets

CIFAR-10

CIFAR-100

Tiny ImageNet

Results from the Paper

Edit

Ranked #22 on Image Classification on Tiny ImageNet Classification

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-10	Convolutional Performer for Vision (CPV)	Percentage correct	94.46	# 142	Compare
Image Classification	CIFAR-10	Convolutional Performer for Vision (CPV)	PARAMS	1.3M	# 185	Compare
Image Classification	CIFAR-100	Convolutional Linear Transformer for Vision (CLTV)	Percentage correct	60.11	# 187	Compare
Image Classification	Tiny ImageNet Classification	Convolutional Nystromformer for Vision (CNV)	Validation Acc	49.56	# 22	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Average Pooling • Batch Normalization • BPE • CCT • Convolution • CvT • Dense Connections • Depthwise Convolution • Depthwise Separable Convolution • Dropout • FAVOR+ • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Performer • Pointwise Convolution • Position-Wise Feed-Forward Layer • RAM • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Convolutional Xformers for Vision

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove