TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Cross-Modal Retrieval	COCO 2014	SGRAF	Image-to-text R@1	57.8	# 24
Cross-Modal Retrieval	COCO 2014	SGRAF	Image-to-text R@10	91.6	# 22
Cross-Modal Retrieval	COCO 2014	SGRAF	Image-to-text R@5	84.9	# 23
Cross-Modal Retrieval	COCO 2014	SGRAF	Text-to-image R@1	41.9	# 27
Cross-Modal Retrieval	COCO 2014	SGRAF	Text-to-image R@10	81.3	# 25
Cross-Modal Retrieval	COCO 2014	SGRAF	Text-to-image R@5	70.7	# 26
Cross-Modal Retrieval	Flickr30k	SGRAF	Image-to-text R@1	77.8	# 15
Cross-Modal Retrieval	Flickr30k	SGRAF	Image-to-text R@10	97.4	# 14
Cross-Modal Retrieval	Flickr30k	SGRAF	Image-to-text R@5	94.1	# 15
Cross-Modal Retrieval	Flickr30k	SGRAF	Text-to-image R@1	58.5	# 16
Cross-Modal Retrieval	Flickr30k	SGRAF	Text-to-image R@10	88.8	# 16
Cross-Modal Retrieval	Flickr30k	SGRAF	Text-to-image R@5	83.0	# 15
Image Retrieval	Flickr30K 1K test	SGRAF	R@1	58.5	# 3
Image Retrieval	Flickr30K 1K test	SGRAF	R@10	88.8	# 5
Image Retrieval	Flickr30K 1K test	SGRAF	R@5	83.0	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-reasoning-and-filtration-for-image/image-retrieval-on-flickr30k-1k-test)](https://paperswithcode.com/sota/image-retrieval-on-flickr30k-1k-test?p=similarity-reasoning-and-filtration-for-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-reasoning-and-filtration-for-image/cross-modal-retrieval-on-flickr30k)](https://paperswithcode.com/sota/cross-modal-retrieval-on-flickr30k?p=similarity-reasoning-and-filtration-for-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-reasoning-and-filtration-for-image/cross-modal-retrieval-on-coco-2014)](https://paperswithcode.com/sota/cross-modal-retrieval-on-coco-2014?p=similarity-reasoning-and-filtration-for-image)`

Similarity Reasoning and Filtration for Image-Text Matching

5 Jan 2021 · Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu ·

Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words. However, how to make the most of these alignments to infer more accurate matching scores is still underexplored. In this paper, we propose a novel Similarity Graph Reasoning and Attention Filtration (SGRAF) network for image-text matching. Specifically, the vector-based similarity representations are firstly learned to characterize the local and global alignments in a more comprehensive manner, and then the Similarity Graph Reasoning (SGR) module relying on one graph convolutional neural network is introduced to infer relation-aware similarities with both the local and global alignments. The Similarity Attention Filtration (SAF) module is further developed to integrate these alignments effectively by selectively attending on the significant and representative alignments and meanwhile casting aside the interferences of non-meaningful alignments. We demonstrate the superiority of the proposed method with achieving state-of-the-art performances on the Flickr30K and MSCOCO datasets, and the good interpretability of SGR and SAF modules with extensive qualitative experiments and analyses.

PDF Abstract

Code

Add Remove Mark official

Paranioar/SGRAF official

198

Tasks

Add Remove

Cross-Modal Retrieval

Image Retrieval

Image-text matching

Sentence

Text Matching

Datasets

MS COCO

Flickr30k

Results from the Paper

Edit

Ranked #3 on Image Retrieval on Flickr30K 1K test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Cross-Modal Retrieval	COCO 2014	SGRAF	Image-to-text R@1	57.8	# 24	Compare
			Image-to-text R@10	91.6	# 22	Compare
			Image-to-text R@5	84.9	# 23	Compare
			Text-to-image R@1	41.9	# 27	Compare
			Text-to-image R@10	81.3	# 25	Compare
			Text-to-image R@5	70.7	# 26	Compare
Cross-Modal Retrieval	Flickr30k	SGRAF	Image-to-text R@1	77.8	# 15	Compare
			Image-to-text R@10	97.4	# 14	Compare
			Image-to-text R@5	94.1	# 15	Compare
			Text-to-image R@1	58.5	# 16	Compare
			Text-to-image R@10	88.8	# 16	Compare
			Text-to-image R@5	83.0	# 15	Compare
Image Retrieval	Flickr30K 1K test	SGRAF	R@1	58.5	# 3	Compare
			R@10	88.8	# 5	Compare
			R@5	83.0	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Similarity Reasoning and Filtration for Image-Text Matching

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove