TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text based Person Retrieval	CUHK-PEDES	IRRA	R@1	73.38	# 5
Text based Person Retrieval	CUHK-PEDES	IRRA	R@10	93.71	# 4
Text based Person Retrieval	CUHK-PEDES	IRRA	R@5	89.93	# 4
Text based Person Retrieval	CUHK-PEDES	IRRA	mAP	66.13	# 6
Text based Person Retrieval	CUHK-PEDES	IRRA	mINP	50.24	# 2
Text based Person Retrieval	ICFG-PEDES	IRRA	mAP	38.06	# 6
Text based Person Retrieval	ICFG-PEDES	IRRA	R@1	63.46	# 6
Text based Person Retrieval	ICFG-PEDES	IRRA	R@5	80.25	# 5
Text based Person Retrieval	ICFG-PEDES	IRRA	R@10	85.82	# 3
Text based Person Retrieval	ICFG-PEDES	IRRA	mINP	7.93	# 1
Text based Person Retrieval	RSTPReid	IRRA	R@1	60.20	# 5
Text based Person Retrieval	RSTPReid	IRRA	R@5	88.20	# 1
Text based Person Retrieval	RSTPReid	IRRA	R@10	81.30	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-implicit-relation-reasoning-and/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=cross-modal-implicit-relation-reasoning-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-implicit-relation-reasoning-and/text-based-person-retrieval-on-rstpreid-1)](https://paperswithcode.com/sota/text-based-person-retrieval-on-rstpreid-1?p=cross-modal-implicit-relation-reasoning-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-implicit-relation-reasoning-and/text-based-person-retrieval-on-icfg-pedes)](https://paperswithcode.com/sota/text-based-person-retrieval-on-icfg-pedes?p=cross-modal-implicit-relation-reasoning-and)`

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

CVPR 2023 · Ding Jiang, Mang Ye ·

Text-to-image person retrieval aims to identify the target person based on a given textual description query. The primary challenge is to learn the mapping of visual and textual modalities into a common latent space. Prior works have attempted to address this challenge by leveraging separately pre-trained unimodal models to extract visual and textual features. However, these approaches lack the necessary underlying alignment capabilities required to match multimodal data effectively. Besides, these works use prior information to explore explicit part alignments, which may lead to the distortion of intra-modality information. To alleviate these issues, we present IRRA: a cross-modal Implicit Relation Reasoning and Aligning framework that learns relations between local visual-textual tokens and enhances global image-text matching without requiring additional prior supervision. Specifically, we first design an Implicit Relation Reasoning module in a masked language modeling paradigm. This achieves cross-modal interaction by integrating the visual cues into the textual tokens with a cross-modal multimodal interaction encoder. Secondly, to globally align the visual and textual embeddings, Similarity Distribution Matching is proposed to minimize the KL divergence between image-text similarity distributions and the normalized label matching distributions. The proposed method achieves new state-of-the-art results on all three public datasets, with a notable margin of about 3%-9% for Rank-1 accuracy compared to prior methods.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

anosorae/irra official

171

Tasks

Add Remove

Image-text matching

Language Modelling

Masked Language Modeling

Person Retrieval

Relation

Retrieval

Text based Person Retrieval

Text-based Person Retrieval

Text Matching

text similarity

Datasets

CUHK-PEDES

RSTPReid ICFG-PEDES

Results from the Paper

Edit

Ranked #5 on Text based Person Retrieval on RSTPReid (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text based Person Retrieval	CUHK-PEDES	IRRA	R@1	73.38	# 5	Compare
			R@10	93.71	# 4	Compare
			R@5	89.93	# 4	Compare
			mAP	66.13	# 6	Compare
			mINP	50.24	# 2	Compare
Text based Person Retrieval	ICFG-PEDES	IRRA	mAP	38.06	# 6	Compare
			R@1	63.46	# 6	Compare
			R@5	80.25	# 5	Compare
			R@10	85.82	# 3	Compare
			mINP	7.93	# 1	Compare
Text based Person Retrieval	RSTPReid	IRRA	R@1	60.20	# 5	Compare
			R@5	88.20	# 1	Compare
			R@10	81.30	# 5	Compare

Methods

Add Remove

ALIGN

Edit Social Preview

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove