TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Audio Classification	AudioSet	Perceiver	Test mAP	0.449	# 29
Image Classification	ImageNet	Perceiver (FF)	Top 1 Accuracy	78%	# 788
Image Classification	ImageNet	Perceiver (FF)	Number of params	44.9M	# 706
Image Classification	ImageNet	Perceiver (FF)	GFLOPs	707.2	# 486
Image Classification	ImageNet	Perceiver	Top 1 Accuracy	76.4%	# 844
3D Point Cloud Classification	ModelNet40	Perceiver	Mean Accuracy	14.3	# 36

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perceiver-general-perception-with-iterative/audio-classification-on-audioset)](https://paperswithcode.com/sota/audio-classification-on-audioset?p=perceiver-general-perception-with-iterative)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perceiver-general-perception-with-iterative/3d-point-cloud-classification-on-modelnet40)](https://paperswithcode.com/sota/3d-point-cloud-classification-on-modelnet40?p=perceiver-general-perception-with-iterative)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/perceiver-general-perception-with-iterative/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=perceiver-general-perception-with-iterative)`

Perceiver: General Perception with Iterative Attention

4 Mar 2021 · Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira ·

Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture is competitive with or outperforms strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video, and video+audio. The Perceiver obtains performance comparable to ResNet-50 and ViT on ImageNet without 2D convolutions by directly attending to 50,000 pixels. It is also competitive in all modalities in AudioSet.

PDF Abstract

Code

Add Remove Mark official

deepmind/deepmind-research official

12,778

towhee-io/towhee

2,972

lucidrains/perceiver-pytorch

1,046

krasserm/perceiver-io

↳ Quickstart in

Colab

404

Rishit-dagli/Perceiver

↳ Quickstart in

Colab

See all 10 implementations

Tasks

Add Remove

3D Point Cloud Classification

Audio Classification

Image Classification

Datasets

ImageNet

ModelNet

AudioSet

Results from the Paper

Edit

Ranked #29 on Audio Classification on AudioSet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Audio Classification	AudioSet	Perceiver	Test mAP	0.449	# 29	Compare
Image Classification	ImageNet	Perceiver (FF)	Top 1 Accuracy	78%	# 788	Compare
			Number of params	44.9M	# 706	Compare
			GFLOPs	707.2	# 486	Compare
Image Classification	ImageNet	Perceiver	Top 1 Accuracy	76.4%	# 844	Compare
3D Point Cloud Classification	ModelNet40	Perceiver	Mean Accuracy	14.3	# 36	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Perceiver: General Perception with Iterative Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove