PLIP: Language-Image Pre-training for Person Representation Learning

15 May 2023  ·  Jialong Zuo, Changqian Yu, Nong Sang, Changxin Gao ·

Pre-training has emerged as an effective technique for learning powerful person representations. Most existing methods have shown that pre-training on pure-vision large-scale datasets like ImageNet and LUPerson has achieved remarkable performance. However, solely relying on visual information, the absence of robust explicit indicators poses a challenge for these methods to learn discriminative person representations. Drawing inspiration from the intrinsic fine-grained attribute indicators of person descriptions, we explore introducing the language modality into person representation learning. To this end, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. To explicitly build fine-grained cross-modal associations, we specifically design three pretext tasks, \ie semantic-fused image colorization, visual-fused attributes prediction, and vision-language matching. In addition, due to the lack of an appropriate dataset, we present a large-scale person dataset named SYNTH-PEDES, where the Stylish Pedestrian Attributes-union Captioning method is proposed to synthesize diverse textual descriptions. We pre-train PLIP on SYNTH-PEDES and evaluate our model by spanning downstream tasks such as text-based Re-ID, image-based Re-ID, and person attribute recognition. Extensive experiments demonstrate that our model not only significantly improves existing methods on all these tasks, but also shows great ability in the few-shot and domain generalization settings. The code, dataset and weights will be released at~\url{}

PDF Abstract


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Text based Person Retrieval CUHK-PEDES PLIP-RN50 R@1 69.23 # 5
Person Re-Identification DukeMTMC-reID PLIP-RN50-MGN mAP 81.7 # 32
Text based Person Retrieval ICFG-PEDES PLIP-RN50 R@1 64.25 # 3
Person Re-Identification Market-1501 PLIP-RN50-ABDNet mAP 91.2 # 28