TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-100	Swin-L + ML-Decoder	Percentage correct	95.1	# 2
Multi-Label Classification	MS-COCO	ML-Decoder(TResNet-XL, resolution 640)	mAP	91.4	# 4
Multi-Label Classification	MS-COCO	ML-Decoder(TResNet-L, resolution 640)	mAP	91.1	# 7
Multi-label zero-shot learning	NUS-WIDE	ML-Decoder	mAP	31.1	# 5
Multi-Label Classification	OpenImages-v6	TResNet-M	mAP	86.8	# 2
Fine-Grained Image Classification	Stanford Cars	TResNet-L + ML-Decoder	Accuracy	96.41%	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=ml-decoder-scalable-and-versatile)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/multi-label-classification-on-openimages-v6)](https://paperswithcode.com/sota/multi-label-classification-on-openimages-v6?p=ml-decoder-scalable-and-versatile)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/fine-grained-image-classification-on-stanford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford?p=ml-decoder-scalable-and-versatile)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/multi-label-classification-on-ms-coco)](https://paperswithcode.com/sota/multi-label-classification-on-ms-coco?p=ml-decoder-scalable-and-versatile)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ml-decoder-scalable-and-versatile/multi-label-zero-shot-learning-on-nus-wide)](https://paperswithcode.com/sota/multi-label-zero-shot-learning-on-nus-wide?p=ml-decoder-scalable-and-versatile)`

ML-Decoder: Scalable and Versatile Classification Head

25 Nov 2021 · Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baruch, Asaf Noy ·

In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation. Public code is available at: https://github.com/Alibaba-MIIL/ML_Decoder

PDF Abstract

Code

Add Remove Mark official

alibaba-miil/ml_decoder official

297

Tasks

Add Remove

Classification

Decoder

Fine-Grained Image Classification

Image Classification

Multi-Label Classification

Multi-label zero-shot learning

Zero-Shot Learning

Datasets

ImageNet

MS COCO

CIFAR-100

Stanford Cars

NUS-WIDE

PASCAL VOC

Open Images V4 OpenImages-v6

Results from the Paper

Edit

Ranked #2 on Fine-Grained Image Classification on Stanford Cars (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-100	Swin-L + ML-Decoder	Percentage correct	95.1	# 2	Compare
Multi-Label Classification	MS-COCO	ML-Decoder(TResNet-XL, resolution 640)	mAP	91.4	# 4	Compare
Multi-Label Classification	MS-COCO	ML-Decoder(TResNet-L, resolution 640)	mAP	91.1	# 7	Compare
Multi-label zero-shot learning	NUS-WIDE	ML-Decoder	mAP	31.1	# 5	Compare
Multi-Label Classification	OpenImages-v6	TResNet-M	mAP	86.8	# 2	Compare
Fine-Grained Image Classification	Stanford Cars	TResNet-L + ML-Decoder	Accuracy	96.41%	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

ML-Decoder: Scalable and Versatile Classification Head

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove