TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text based Person Retrieval	CUHK-PEDES	ViTAA	R@1	55.97	# 18
Text based Person Retrieval	CUHK-PEDES	ViTAA	R@10	83.52	# 18
Text based Person Retrieval	CUHK-PEDES	ViTAA	R@5	75.84	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vitaa-visual-textual-attributes-alignment-in/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=vitaa-visual-textual-attributes-alignment-in)`

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

ECCV 2020 · Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang ·

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as the performance boosting by a robust feature learning that the referred identity can be accurately bundled by multiple attribute visual cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into subspaces corresponding to attributes using a light auxiliary attribute segmentation computing branch. It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. Upon that, we validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Code will be publicly available upon publication.