Text based Person Retrieval
20 papers with code • 3 benchmarks • 3 datasets
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.
Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales.
In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance.
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description.
Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features.
Many previous methods on text-based person retrieval tasks are devoted to learning a latent common space mapping, with the purpose of extracting modality-invariant features from both visual and textual modality.
Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images.