TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-10	CCT-6/3x1	Percentage correct	95.29	# 126
Image Classification	CIFAR-10	CCT-6/3x1	PARAMS	3.17M	# 190
Image Classification	CIFAR-10	CCT-7/3x1*	Percentage correct	98	# 52
Image Classification	CIFAR-10	CCT-7/3x1*	PARAMS	3.76M	# 192
Image Classification	CIFAR-100	CCT-6/3x1	Percentage correct	77.31	# 136
Image Classification	CIFAR-100	CCT-6/3x1	PARAMS	3.17M	# 183
Image Classification	CIFAR-100	CCT-7/3x1*	Percentage correct	82.72	# 97
Image Classification	Flowers-102	CCT-14/7x2	Accuracy	99.76	# 1
Image Classification	ImageNet	CCT-16/7x2	Top 1 Accuracy	80.28%	# 652
Image Classification	ImageNet	CCT-14/7x2 \| 384	Top 1 Accuracy	82.71%	# 464
Image Classification	ImageNet	CCT-14/7x2	Top 1 Accuracy	81.34%	# 592
Image Classification	ImageNet	CCT-14/7x2	Number of params	22.36M	# 567
Image Classification	ImageNet	CCT-14/7x2	GFLOPs	11.06	# 306
Fine-Grained Image Classification	Oxford 102 Flowers	CCT-14/7x2	FLOPS	15G	# 3
Fine-Grained Image Classification	Oxford 102 Flowers	CCT-14/7x2	PARAMS	22.5M	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/escaping-the-big-data-paradigm-with-compact/image-classification-on-flowers-102)](https://paperswithcode.com/sota/image-classification-on-flowers-102?p=escaping-the-big-data-paradigm-with-compact)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/escaping-the-big-data-paradigm-with-compact/fine-grained-image-classification-on-oxford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-oxford?p=escaping-the-big-data-paradigm-with-compact)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/escaping-the-big-data-paradigm-with-compact/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=escaping-the-big-data-paradigm-with-compact)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/escaping-the-big-data-paradigm-with-compact/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=escaping-the-big-data-paradigm-with-compact)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/escaping-the-big-data-paradigm-with-compact/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=escaping-the-big-data-paradigm-with-compact)`

Escaping the Big Data Paradigm with Compact Transformers

12 Apr 2021 · Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, Humphrey Shi ·

With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers. We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets. Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results. Our best model can reach 98% accuracy when training from scratch on CIFAR-10 with only 3.7M parameters, which is a significant improvement in data-efficiency over previous Transformer based models being over 10x smaller than other transformers and is 15% the size of ResNet50 while achieving similar performance. CCT also outperforms many modern CNN based approaches, and even some recent NAS-based approaches. Additionally, we obtain a new SOTA result on Flowers-102 with 99.76% top-1 accuracy, and improve upon the existing baseline on ImageNet (82.71% accuracy with 29% as many parameters as ViT), as well as NLP tasks. Our simple and compact design for transformers makes them more feasible to study for those with limited computing resources and/or dealing with small datasets, while extending existing research efforts in data efficient transformers. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Compact-Transformers.

PDF Abstract

Code

Add Remove Mark official

SHI-Labs/Compact-Transformers official

466

keras-team/keras-io

2,633

brohrer/sharpened-cosine-similarity

↳ Quickstart in

Colab

247

rishikksh20/compact-convolution-tra…

Shreyas-Bhat/CompactTransformers

See all 8 implementations

Tasks

Add Remove

Fine-Grained Image Classification

Image Classification

Superpixel Image Classification

Datasets

CIFAR-10

ImageNet

CIFAR-100

MNIST

Fashion-MNIST

Oxford 102 Flower

JFT-300M

Results from the Paper

Edit

Ranked #1 on Image Classification on Flowers-102 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-10	CCT-6/3x1	Percentage correct	95.29	# 126	Compare
Image Classification	CIFAR-10	CCT-6/3x1	PARAMS	3.17M	# 190	Compare
Image Classification	CIFAR-10	CCT-7/3x1*	Percentage correct	98	# 52	Compare
Image Classification	CIFAR-10	CCT-7/3x1*	PARAMS	3.76M	# 192	Compare
Image Classification	CIFAR-100	CCT-6/3x1	Percentage correct	77.31	# 136	Compare
Image Classification	CIFAR-100	CCT-6/3x1	PARAMS	3.17M	# 183	Compare
Image Classification	CIFAR-100	CCT-7/3x1*	Percentage correct	82.72	# 97	Compare
Image Classification	Flowers-102	CCT-14/7x2	Accuracy	99.76	# 1	Compare
Image Classification	ImageNet	CCT-16/7x2	Top 1 Accuracy	80.28%	# 652	Compare
Image Classification	ImageNet	CCT-14/7x2 \| 384	Top 1 Accuracy	82.71%	# 464	Compare
Image Classification	ImageNet	CCT-14/7x2	Top 1 Accuracy	81.34%	# 592	Compare
			Number of params	22.36M	# 567	Compare
			GFLOPs	11.06	# 306	Compare
Fine-Grained Image Classification	Oxford 102 Flowers	CCT-14/7x2	FLOPS	15G	# 3	Compare
Fine-Grained Image Classification	Oxford 102 Flowers	CCT-14/7x2	PARAMS	22.5M	# 23	Compare

Methods

Add Remove

Absolute Position Encodings • CCT • Convolution • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Transformer

Edit Social Preview

Escaping the Big Data Paradigm with Compact Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove