PLIP: Language-Image Pre-training for Person Representation Learning

15 May 2023  ·  Jialong Zuo, Changqian Yu, Nong Sang, Changxin Gao ·

Pre-training has emerged as an effective technique for learning powerful person representations. Most existing methods have shown that pre-training on pure-vision large-scale datasets like ImageNet and LUPerson has achieved remarkable performance. However, solely relying on visual information, the absence of robust explicit indicators poses a challenge for these methods to learn discriminative person representations. Drawing inspiration from the intrinsic fine-grained attribute indicators of person descriptions, we explore introducing the language modality into person representation learning. To this end, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. To explicitly build fine-grained cross-modal associations, we specifically design three pretext tasks, \ie semantic-fused image colorization, visual-fused attributes prediction, and vision-language matching. In addition, due to the lack of an appropriate dataset, we present a large-scale person dataset named SYNTH-PEDES, where the Stylish Pedestrian Attributes-union Captioning method is proposed to synthesize diverse textual descriptions. We pre-train PLIP on SYNTH-PEDES and evaluate our model by spanning downstream tasks such as text-based Re-ID, image-based Re-ID, and person attribute recognition. Extensive experiments demonstrate that our model not only significantly improves existing methods on all these tasks, but also shows great ability in the few-shot and domain generalization settings. The code, dataset and weights will be released at~\url{https://github.com/Zplusdragon/PLIP}

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Text based Person Retrieval CUHK-PEDES PLIP-RN50 R@1 69.23 # 7
R@10 91.16 # 7
R@5 85.84 # 7
Person Re-Identification DukeMTMC-reID PLIP-RN50-MGN mAP 81.7 # 32
Text based Person Retrieval ICFG-PEDES PLIP-RN50 R@1 64.25 # 5
R@5 80.88 # 2
R@10 86.32 # 2
Person Re-Identification Market-1501 PLIP-RN50-ABDNet mAP 91.2 # 29

Methods