TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Cross-Modal Retrieval	CUHK-PEDES	Dual Path	Text-to-image Medr	2	# 1
Text based Person Retrieval	CUHK-PEDES	Dual Path	R@1	44.4	# 21
Text based Person Retrieval	CUHK-PEDES	Dual Path	R@10	75.07	# 22
Text based Person Retrieval	CUHK-PEDES	Dual Path	R@5	66.26	# 21
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Image-to-text R@1	55.6	# 20
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Image-to-text R@5	81.9	# 20
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Image-to-text R@10	89.5	# 19
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Text-to-image R@1	39.1	# 23
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Text-to-image R@10	80.9	# 20
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Text-to-image R@5	69.2	# 22
Cross-Modal Retrieval	MSCOCO-1k	Dual-path CNN	Image-to-text R@1	41.2	# 2
Cross-Modal Retrieval	MSCOCO-1k	Dual-path CNN	Text-to-image R@1	25.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-path-convolutional-image-text-embedding/cross-modal-retrieval-on-cuhk-pedes)](https://paperswithcode.com/sota/cross-modal-retrieval-on-cuhk-pedes?p=dual-path-convolutional-image-text-embedding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-path-convolutional-image-text-embedding/cross-modal-retrieval-on-mscoco-1k)](https://paperswithcode.com/sota/cross-modal-retrieval-on-mscoco-1k?p=dual-path-convolutional-image-text-embedding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-path-convolutional-image-text-embedding/cross-modal-retrieval-on-flickr30k)](https://paperswithcode.com/sota/cross-modal-retrieval-on-flickr30k?p=dual-path-convolutional-image-text-embedding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-path-convolutional-image-text-embedding/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=dual-path-convolutional-image-text-embedding)`

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

15 Nov 2017 · Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, Yi-Dong Shen ·

Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image / text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss is hard for network learning, since it starts from the two heterogeneous features to build inter-modal relationship. To address this problem, we propose the instance loss which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image / text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this paper constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.

PDF Abstract

Code

Add Remove Mark official

layumi/Image-Text-Embedding official

280

pshroff04/Dual_Path_CNN

Tasks

Add Remove

Content-Based Image Retrieval

Cross-Modal Retrieval

NLP based Person Retrival

Person Retrieval

Retrieval

Text based Person Retrieval

Datasets

MS COCO

Flickr30k

CUHK03

VIPeR

CUHK-PEDES

Results from the Paper

Edit

Ranked #1 on Cross-Modal Retrieval on CUHK-PEDES

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Cross-Modal Retrieval	CUHK-PEDES	Dual Path	Text-to-image Medr	2	# 1	Compare
Text based Person Retrieval	CUHK-PEDES	Dual Path	R@1	44.4	# 21	Compare
			R@10	75.07	# 22	Compare
			R@5	66.26	# 21	Compare
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Image-to-text R@1	55.6	# 20	Compare
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Image-to-text R@5	81.9	# 20	Compare
Cross-Modal Retrieval	Flickr30k	Dual-Path (ResNet)	Image-to-text R@10	89.5	# 19	Compare
			Text-to-image R@1	39.1	# 23	Compare
			Text-to-image R@10	80.9	# 20	Compare
			Text-to-image R@5	69.2	# 22	Compare
Cross-Modal Retrieval	MSCOCO-1k	Dual-path CNN	Image-to-text R@1	41.2	# 2	Compare
Cross-Modal Retrieval	MSCOCO-1k	Dual-path CNN	Text-to-image R@1	25.3	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove