TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Person Re-Identification	DukeMTMC-reID	CLIP-ReID (without re-ranking)	Rank-1	90.8	# 26
Person Re-Identification	DukeMTMC-reID	CLIP-ReID (without re-ranking)	mAP	83.1	# 26
Person Re-Identification	Market-1501	CLIP-ReID (without re-ranking)	Rank-1	95.4	# 47
Person Re-Identification	Market-1501	CLIP-ReID (without re-ranking)	mAP	90.5	# 36
Person Re-Identification	MSMT17	CLIP-ReID (with re-ranking)	Rank-1	91.1	# 2
Person Re-Identification	MSMT17	CLIP-ReID (with re-ranking)	mAP	86.7	# 1
Person Re-Identification	MSMT17	CLIP-ReID (without re-ranking)	Rank-1	89.7	# 5
Person Re-Identification	MSMT17	CLIP-ReID (without re-ranking)	mAP	75.8	# 8
Vehicle Re-Identification	VehicleID Small	CLIP-ReID (without re-ranking)	Rank-1	85.5	# 8
Vehicle Re-Identification	VehicleID Small	CLIP-ReID (without re-ranking)	Rank-5	97.2	# 7
Vehicle Re-Identification	VeRi-776	CLIP-ReID (without re-ranking)	mAP	84.5	# 6
Vehicle Re-Identification	VeRi-776	CLIP-ReID (without re-ranking)	Rank-1	97.3	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-reid-exploiting-vision-language-model/person-re-identification-on-msmt17)](https://paperswithcode.com/sota/person-re-identification-on-msmt17?p=clip-reid-exploiting-vision-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-reid-exploiting-vision-language-model/vehicle-re-identification-on-veri-776)](https://paperswithcode.com/sota/vehicle-re-identification-on-veri-776?p=clip-reid-exploiting-vision-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-reid-exploiting-vision-language-model/vehicle-re-identification-on-vehicleid-small)](https://paperswithcode.com/sota/vehicle-re-identification-on-vehicleid-small?p=clip-reid-exploiting-vision-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-reid-exploiting-vision-language-model/person-re-identification-on-dukemtmc-reid)](https://paperswithcode.com/sota/person-re-identification-on-dukemtmc-reid?p=clip-reid-exploiting-vision-language-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/clip-reid-exploiting-vision-language-model/person-re-identification-on-market-1501)](https://paperswithcode.com/sota/person-re-identification-on-market-1501?p=clip-reid-exploiting-vision-language-model)`

CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels

25 Nov 2022 · Siyuan Li, Li Sun, Qingli Li ·

Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lacking concrete text descriptions. Therefore, it remains to be determined how such models could be applied to these tasks. This paper first finds out that simply fine-tuning the visual model initialized by the image encoder in CLIP, has already obtained competitive performances in various ReID tasks. Then we propose a two-stage strategy to facilitate a better visual representation. The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID and give them to the text encoder to form ambiguous descriptions. In the first training stage, image and text encoders from CLIP keep fixed, and only the text tokens are optimized from scratch by the contrastive loss computed within a batch. In the second stage, the ID-specific text tokens and their encoder become static, providing constraints for fine-tuning the image encoder. With the help of the designed loss in the downstream task, the image encoder is able to represent data as vectors in the feature embedding accurately. The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks. Code is available at https://github.com/Syliz517/CLIP-ReID.

PDF Abstract

Code

Add Remove Mark official

syliz517/clip-reid official

192

Tasks

Add Remove

Image Classification

Language Modelling

Person Re-Identification

Vehicle Re-Identification

Datasets

Market-1501

DukeMTMC-reID MSMT17

VehicleID

VeRi-776

Results from the Paper

Edit

Ranked #1 on Person Re-Identification on MSMT17

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Person Re-Identification	DukeMTMC-reID	CLIP-ReID (without re-ranking)	Rank-1	90.8	# 26	Compare
Person Re-Identification	DukeMTMC-reID	CLIP-ReID (without re-ranking)	mAP	83.1	# 26	Compare
Person Re-Identification	Market-1501	CLIP-ReID (without re-ranking)	Rank-1	95.4	# 47	Compare
Person Re-Identification	Market-1501	CLIP-ReID (without re-ranking)	mAP	90.5	# 36	Compare
Person Re-Identification	MSMT17	CLIP-ReID (with re-ranking)	Rank-1	91.1	# 2	Compare
Person Re-Identification	MSMT17	CLIP-ReID (with re-ranking)	mAP	86.7	# 1	Compare
Person Re-Identification	MSMT17	CLIP-ReID (without re-ranking)	Rank-1	89.7	# 5	Compare
Person Re-Identification	MSMT17	CLIP-ReID (without re-ranking)	mAP	75.8	# 8	Compare
Vehicle Re-Identification	VehicleID Small	CLIP-ReID (without re-ranking)	Rank-1	85.5	# 8	Compare
Vehicle Re-Identification	VehicleID Small	CLIP-ReID (without re-ranking)	Rank-5	97.2	# 7	Compare
Vehicle Re-Identification	VeRi-776	CLIP-ReID (without re-ranking)	mAP	84.5	# 6	Compare
Vehicle Re-Identification	VeRi-776	CLIP-ReID (without re-ranking)	Rank-1	97.3	# 4	Compare

Methods

Add Remove

CLIP

Edit Social Preview

CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove