TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multi-label zero-shot learning	NUS-WIDE	MKT(CLIP)	mAP	42.7	# 1
Multi-label zero-shot learning	NUS-WIDE	MKT(IN-1K)	mAP	37.6	# 3
Multi-label zero-shot learning	Open Images V4	MKT(IN-1K)	MAP	89.2	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-vocabulary-multi-label-classification/multi-label-zero-shot-learning-on-nus-wide)](https://paperswithcode.com/sota/multi-label-zero-shot-learning-on-nus-wide?p=open-vocabulary-multi-label-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-vocabulary-multi-label-classification/multi-label-zero-shot-learning-on-open-images)](https://paperswithcode.com/sota/multi-label-zero-shot-learning-on-open-images?p=open-vocabulary-multi-label-classification)`

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

5 Jul 2022 · Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia ·

Real-world recognition system often encounters the challenge of unseen labels. To identify such unseen labels, multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding (e.g., GloVe). However, such methods only exploit single-modal knowledge from a language model, while ignoring the rich semantic information inherent in image-text pairs. Instead, recently developed open-vocabulary (OV) based methods succeed in exploiting such information of image-text pairs in object detection, and achieve impressive performance. Inspired by the success of OV-based methods, we propose a novel open-vocabulary framework, named multi-modal knowledge transfer (MKT), for multi-label classification. Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model. To facilitate transferring the image-text matching ability of VLP model, knowledge distillation is employed to guarantee the consistency of image and label embeddings, along with prompt tuning to further update the label embeddings. To further enable the recognition of multiple objects, a simple but effective two-stream module is developed to capture both local and global features. Extensive experimental results show that our method significantly outperforms state-of-the-art methods on public benchmark datasets. The source code is available at https://github.com/sunanhe/MKT.

PDF Abstract

Code

Add Remove Mark official

sunanhe/mkt official

111

Tasks

Add Remove

Image-text matching

Knowledge Distillation

Language Modelling

Multi-Label Classification

Multi-label zero-shot learning

object-detection

Object Detection

Text Matching

Transfer Learning

Zero-Shot Learning

Datasets

NUS-WIDE

Open Images V4

Results from the Paper

Edit

Ranked #1 on Multi-label zero-shot learning on Open Images V4

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multi-label zero-shot learning	NUS-WIDE	MKT(CLIP)	mAP	42.7	# 1	Compare
Multi-label zero-shot learning	NUS-WIDE	MKT(IN-1K)	mAP	37.6	# 3	Compare
Multi-label zero-shot learning	Open Images V4	MKT(IN-1K)	MAP	89.2	# 1	Compare

Methods

Add Remove

Knowledge Distillation

Edit Social Preview

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove