Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification

26 Oct 2023  ยท  Jiachen Li, Xiaojin Gong ยท

This work aims to adapt large-scale pre-trained vision-language models, such as contrastive language-image pretraining (CLIP), to enhance the performance of object reidentification (Re-ID) across various supervision settings. Although prompt learning has enabled a recent work named CLIP-ReID to achieve promising performance, the underlying mechanisms and the necessity of prompt learning remain unclear due to the absence of semantic labels in ReID tasks. In this work, we first analyze the role prompt learning in CLIP-ReID and identify its limitations. Based on our investigations, we propose a simple yet effective approach to adapt CLIP for supervised object Re-ID. Our approach directly fine-tunes the image encoder of CLIP using a prototypical contrastive learning (PCL) loss, eliminating the need for prompt learning. Experimental results on both person and vehicle Re-ID datasets demonstrate the competitiveness of our method compared to CLIP-ReID. Furthermore, we extend our PCL-based CLIP fine-tuning approach to unsupervised scenarios, where we achieve state-of-the art performance.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Unsupervised Person Re-Identification Market-1501 PCL-CLIP (CC) Rank-1 94.2 # 6
MAP 86.9 # 6
Rank-10 98.7 # 1
Rank-5 97.8 # 3
Person Re-Identification Market-1501 PCL-CLIP (L_pcl+L_id) Rank-1 95.9 # 28
Rank-5 98.5 # 3
mAP 91.4 # 27
Person Re-Identification Market-1501 PCL-CLIP (L_pcl) Rank-1 96.1 # 21
Rank-5 98.8 # 2
mAP 91.0 # 30
Unsupervised Person Re-Identification Market-1501 PCL-CLIP (O2CAP) Rank-1 94.8 # 4
MAP 88.4 # 3
Rank-10 98.7 # 1
Rank-5 98.0 # 1
Unsupervised Person Re-Identification Market-1501 PCL-CLIP (CAP) Rank-1 93.9 # 8
MAP 87.4 # 5
Rank-10 98.5 # 5
Rank-5 97.7 # 5
Unsupervised Person Re-Identification MSMT17 PCL-CLIP (O2CAP) mAP 65.5 # 1
Rank-1 84.9 # 1
Rank-5 92.0 # 1
Rank-10 94.0 # 1
Unsupervised Person Re-Identification MSMT17 PCL-CLIP (CAP) mAP 53.6 # 4
Rank-1 79.0 # 3
Rank-5 88.4 # 3
Rank-10 91.1 # 3
Unsupervised Person Re-Identification MSMT17 PCL-CLIP (CC) mAP 56.4 # 3
Rank-1 77.9 # 4
Rank-5 85.2 # 4
Rank-10 87.2 # 4
Person Re-Identification MSMT17 PCL-CLIP (L_pcl+L_id) Rank-1 89.8 # 4
mAP 76.1 # 7
Rank-5 94.7 # 1
Rank-10 96.0 # 1
Person Re-Identification MSMT17 PCL-CLIP (L_pcl) Rank-1 89.2 # 8
mAP 73.8 # 10
Rank-5 94.7 # 1
Rank-10 95.8 # 2
Unsupervised Vehicle Re-Identification VeRi-776 PCL-CLIP (CC) mAP 34.7 # 4
Rank-1 72.7 # 4
Rank-5 78.6 # 4
Rank-10 82.9 # 4
Unsupervised Vehicle Re-Identification VeRi-776 PCL-CLIP (O2CAP) mAP 45.5 # 1
Rank-1 90.7 # 1
Rank-5 93.9 # 1
Rank-10 95.0 # 1
Unsupervised Vehicle Re-Identification VeRi-776 PCL-CLIP (CAP) mAP 44.2 # 2
Rank-1 89.9 # 2
Rank-5 93.4 # 2
Rank-10 94.8 # 2

Methods