TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text based Person Retrieval	CUHK-PEDES	SAF	R@1	64.13	# 9
Text based Person Retrieval	CUHK-PEDES	SAF	R@10	88.4	# 10
Text based Person Retrieval	CUHK-PEDES	SAF	R@5	82.62	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-semantic-aligned-feature/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=learning-semantic-aligned-feature)`

Learning Semantic-Aligned Feature Representation for Text-based Person Search

13 Dec 2021 · Shiping Li, Min Cao, Min Zhang ·

Text-based person search aims to retrieve images of a certain pedestrian by a textual description. The key challenge of this task is to eliminate the inter-modality gap and achieve the feature alignment across modalities. In this paper, we propose a semantic-aligned embedding method for text-based person search, in which the feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features. First, we introduce two Transformer-based backbones to encode robust feature representations of the images and texts. Second, we design a semantic-aligned feature aggregation network to adaptively select and aggregate features with the same semantics into part-aware features, which is achieved by a multi-head attention module constrained by a cross-modality part alignment loss and a diversity loss. Experimental results on the CUHK-PEDES and Flickr30K datasets show that our method achieves state-of-the-art performances.

PDF Abstract