TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text based Person Retrieval	CUHK-PEDES	CMPM+CMPC	R@1	49.37	# 20
Text based Person Retrieval	CUHK-PEDES	CMPM+CMPC	R@10	79.27	# 20
Text based Person Retrieval	CUHK-PEDES	CMPM+CMPC	R@5	-	# 22
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Image-to-text R@1	49.6	# 23
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Image-to-text R@10	86.1	# 22
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Image-to-text R@5	76.8	# 22
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Text-to-image R@1	37.3	# 24
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Text-to-image R@10	75.5	# 23
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Text-to-image R@5	65.7	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-cross-modal-projection-learning-for/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=deep-cross-modal-projection-learning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-cross-modal-projection-learning-for/cross-modal-retrieval-on-flickr30k)](https://paperswithcode.com/sota/cross-modal-retrieval-on-flickr30k?p=deep-cross-modal-projection-learning-for)`

Deep Cross-Modal Projection Learning for Image-Text Matching

ECCV 2018 · Ying Zhang, Huchuan Lu ·

The key point of image-text matching is how to accurately measure the similarity between visual and textual inputs. Despite the great progress of associating the deep cross-modal embeddings with the bi-directional ranking loss, developing the strategies for mining useful triplets and selecting appropriate margins remains a challenge in real applications. In this paper, we propose a cross-modal projection matching (CMPM) loss and a cross-modal projection classification (CMPC) loss for learning discriminative image-text embeddings. The CMPM loss minimizes the KL divergence between the projection compatibility distributions and the normalized matching distributions defined with all the positive and negative samples in a mini-batch. The CMPC loss attempts to categorize the vector projection of representations from one modality onto another with the improved norm-softmax loss, for further enhancing the feature compactness of each class. Extensive analysis and experiments on multiple datasets demonstrate the superiority of the proposed approach.

PDF Abstract

Code

Add Remove Mark official

YingZhangDUT/Cross-Modal-Projection… official

Tasks

Add Remove

Cross-Modal Retrieval

Image-text matching

Text based Person Retrieval

Text Matching

Datasets

MS COCO

Flickr30k

CUHK-PEDES

Results from the Paper

Add Remove

Ranked #20 on Text based Person Retrieval on CUHK-PEDES

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Cross-Modal Retrieval	Flickr30k	CMPL (ResNet)	Image-to-text R@1	49.6	# 23	Compare
			Image-to-text R@10	86.1	# 22	Compare
			Image-to-text R@5	76.8	# 22	Compare
			Text-to-image R@1	37.3	# 24	Compare
			Text-to-image R@10	75.5	# 23	Compare
			Text-to-image R@5	65.7	# 23	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Text based Person Retrieval	CUHK-PEDES	CMPM+CMPC	R@1	49.37	# 20	See all
			R@10	79.27	# 20	See all
			R@5	-	# 22	See all

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Deep Cross-Modal Projection Learning for Image-Text Matching

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove