TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Cross-Modal Retrieval	RSICD	PE-RSITR (MRS-Adapter)	Mean Recall	31.12%	# 3
Cross-Modal Retrieval	RSICD	PE-RSITR (MRS-Adapter)	Image-to-text R@1	14.13%	# 3
Cross-Modal Retrieval	RSICD	PE-RSITR (MRS-Adapter)	text-to-image R@1	11.63%	# 3
Cross-Modal Retrieval	RSITMD	PE-RSITR (MRS-Adapter)	Mean Recall	44.47%	# 3
Cross-Modal Retrieval	RSITMD	PE-RSITR (MRS-Adapter)	Image-to-text R@1	23.67%	# 3
Cross-Modal Retrieval	RSITMD	PE-RSITR (MRS-Adapter)	text-to-imageR@1	20.10%	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-transfer-learning-for-1/cross-modal-retrieval-on-rsicd)](https://paperswithcode.com/sota/cross-modal-retrieval-on-rsicd?p=parameter-efficient-transfer-learning-for-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-transfer-learning-for-1/cross-modal-retrieval-on-rsitmd)](https://paperswithcode.com/sota/cross-modal-retrieval-on-rsitmd?p=parameter-efficient-transfer-learning-for-1)`

Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval

24 Aug 2023 · Yuan Yuan, Yang Zhan, Zhitong Xiong ·

Vision-and-language pre-training (VLP) models have experienced a surge in popularity recently. By fine-tuning them on specific datasets, significant performance improvements have been observed in various tasks. However, full fine-tuning of VLP models not only consumes a significant amount of computational resources but also has a significant environmental impact. Moreover, as remote sensing (RS) data is constantly being updated, full fine-tuning may not be practical for real-world applications. To address this issue, in this work, we investigate the parameter-efficient transfer learning (PETL) method to effectively and efficiently transfer visual-language knowledge from the natural domain to the RS domain on the image-text retrieval task. To this end, we make the following contributions. 1) We construct a novel and sophisticated PETL framework for the RS image-text retrieval (RSITR) task, which includes the pretrained CLIP model, a multimodal remote sensing adapter, and a hybrid multi-modal contrastive (HMMC) learning objective; 2) To deal with the problem of high intra-modal similarity in RS data, we design a simple yet effective HMMC loss; 3) We provide comprehensive empirical studies for PETL-based RS image-text retrieval. Our results demonstrate that the proposed method is promising and of great potential for practical applications. 4) We benchmark extensive state-of-the-art PETL methods on the RSITR task. Our proposed model only contains 0.16M training parameters, which can achieve a parameter reduction of 98.9% compared to full fine-tuning, resulting in substantial savings in training costs. Our retrieval performance exceeds traditional methods by 7-13% and achieves comparable or better performance than full fine-tuning. This work can provide new ideas and useful insights for RS vision-language tasks.

PDF Abstract

Code

Add Remove Mark official

ZhanYang-nwpu/PE-RSITR official

Tasks

Add Remove

Cross-Modal Retrieval

Image-text matching

Retrieval

Text Retrieval

Transfer Learning

Datasets

RSICD RSITMD

Results from the Paper

Edit

Ranked #3 on Cross-Modal Retrieval on RSICD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Cross-Modal Retrieval	RSICD	PE-RSITR (MRS-Adapter)	Mean Recall	31.12%	# 3	Compare
			Image-to-text R@1	14.13%	# 3	Compare
			text-to-image R@1	11.63%	# 3	Compare
Cross-Modal Retrieval	RSITMD	PE-RSITR (MRS-Adapter)	Mean Recall	44.47%	# 3	Compare
			Image-to-text R@1	23.67%	# 3	Compare
			text-to-imageR@1	20.10%	# 3	Compare

Methods

Add Remove

CLIP

Edit Social Preview

Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove