TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Change detection for remote sensing images	CDD Dataset (season-varying)	MAE+MTP(ViT-B+RVSA)	F1-Score	0.9787	# 4
Change Detection	CDD Dataset (season-varying)	MAE+MTP(ViT-B+RVSA)	F1-Score	97.87	# 5
Change Detection	CDD Dataset (season-varying)	MAE+MTP(ViT-L+RVSA)	F1-Score	97.98	# 3
Change detection for remote sensing images	CDD Dataset (season-varying)	IMP+MTP(InternImage-XL)	F1-Score	0.9833	# 2
Change detection for remote sensing images	CDD Dataset (season-varying)	MAE+MTP(ViT-L+RVSA)	F1-Score	0.9798	# 3
Change Detection	CDD Dataset (season-varying)	IMP+MTP(InternImage-XL)	F1-Score	98.33	# 2
Object Detection In Aerial Images	DIOR	IMP+MTP(InternImage-XL)	AP50	78.0	# 3
Object Detection In Aerial Images	DIOR	MAE+MTP(ViT-L+RVSA)	AP50	81.1	# 1
Object Detection In Aerial Images	DIOR	MAE+MTP(ViT-B+RVSA)	AP50	79.4	# 2
Object Detection In Aerial Images	DIOR-R	MAE+MTP(ViT-L+RVSA)	mAP	74.54	# 1
Object Detection In Aerial Images	DIOR-R	IMP+MTP(InternImage-XL)	mAP	72.17	# 3
Object Detection In Aerial Images	DIOR-R	MAE+MTP(ViT-B+RVSA)	mAP	71.29	# 4
Object Detection In Aerial Images	DOTA	IMP+MTP(InternImage-XL)	mAP	80.77%	# 11
Object Detection In Aerial Images	DOTA	MAE+MTP(ViT-B+RVSA)	mAP	80.67%	# 13
Object Detection In Aerial Images	DOTA	MAE+MTP(ViT-L+RVSA)	mAP	81.66%	# 4
Oriented Object Detection	DOTA 1.0	MAE+MTP(ViT-L+RVSA)	mAP	81.66	# 4
Oriented Object Detection	DOTA 2.0	MAE+MTP(ViT-L+RVSA)	mAP	58.41	# 3
Oriented Object Detection	DOTA 2.0	IMP+MTP(InternImage-XL)	mAP	55.13	# 6
Oriented Object Detection	DOTA 2.0	MAE+MTP(ViT-B+RVSA)	mAP	56.08	# 5
Image Classification	EuroSAT	MAE+MTP(ViT-L+RVSA)	Accuracy (%)	98.78	# 6
Image Classification	EuroSAT	MAE+MTP(ViT-B+RVSA)	Accuracy (%)	98.76	# 8
Image Classification	EuroSAT	IMP+MTP(IntenImage-XL)	Accuracy (%)	99.24	# 1
Object Detection In Aerial Images	FAIR1M-2.0	MAE+MTP(ViT-B+RVSA)	mAP	51.92	# 2
Object Detection In Aerial Images	FAIR1M-2.0	IMP+MTP(InternImage-XL)	mAP	50.93	# 3
Object Detection In Aerial Images	FAIR1M-2.0	MAE+MTP(ViT-L+RVSA)	mAP	53.00	# 1
Building change detection for remote sensing images	LEVIR-CD	MAE+MTP(ViT-B+RVSA)	F1	92.22	# 4
Building change detection for remote sensing images	LEVIR-CD	IMP+MTP(InternImage-XL)	F1	92.54	# 2
Building change detection for remote sensing images	LEVIR-CD	MAE+MTP(ViT-L+RVSA)	F1	92.67	# 1
Change Detection	LEVIR-CD	IMP+MTP(InternImage-XL)	F1	92.54	# 2
Change Detection	LEVIR-CD	MAE+MTP(ViT-L+RVSA)	F1	92.67	# 1
Change Detection	LEVIR-CD	MAE+MTP(ViT-B+RVSA)	F1	92.22	# 4
Semantic Segmentation	LoveDA	MAE+MTP(ViT-B+RVSA)	Category mIoU	52.39	# 10
Semantic Segmentation	LoveDA	MAE+MTP(ViT-L+RVSA)	Category mIoU	54.17	# 2
Semantic Segmentation	LoveDA	IMP+MTP(InternImage-XL)	Category mIoU	54.17	# 2
Aerial Scene Classification	NWPU (20% as trainset)	MAE+MTP(ViT-B+RVSA)	Accuracy	95.57	# 5
Aerial Scene Classification	NWPU (20% as trainset)	MAE+MTP(ViT-L+RVSA)	Accuracy	95.88	# 2
Aerial Scene Classification	NWPU (20% as trainset)	IMP+MTP(InternImage-XL)	Accuracy	96.27	# 1
Change Detection	OSCD - 3ch	MAE+MTP(ViT-B+RVSA)	F1	53.36	# 3
Change Detection	OSCD - 3ch	MAE+MTP(ViT-L+RVSA)	F1	55.92	# 1
Change Detection	OSCD - 3ch	IMP+MTP(InternImage-XL)	F1	55.61	# 2
Semantic Segmentation	SpaceNet 1	MAE+MTP(ViT-B+RVSA)	Mean IoU	79.63	# 2
Semantic Segmentation	SpaceNet 1	MAE+MTP(ViT-L+RVSA)	Mean IoU	79.54	# 3
Semantic Segmentation	SpaceNet 1	IMP+MTP(InternImage-XL)	Mean IoU	79.16	# 4
Semantic Segmentation	SpaceNet 1	MAE+MTP(ViT-L)	Mean IoU	79.69	# 1
Change Detection	WHU Building Dataset	MAE+MTP(ViT-B+RVSA)	F1-score	0.9432	# 5
Change Detection	WHU Building Dataset	MAE+MTP(ViT-L+RVSA)	F1-score	0.9475	# 3
Change Detection	WHU Building Dataset	IMP+MTP(InternImage-XL)	F1-score	0.9559	# 1
Object Detection In Aerial Images	xView	IMP+MTP(InternImage-XL)	AP50	18.2	# 2
Object Detection In Aerial Images	xView	MAE+MTP(ViT-B+RVSA)	AP50	16.4	# 3
Object Detection In Aerial Images	xView	MAE+MTP(ViT-L+RVSA)	AP50	19.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/object-detection-in-aerial-images-on-dior)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dior?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/object-detection-in-aerial-images-on-dior-r)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dior-r?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/image-classification-on-eurosat)](https://paperswithcode.com/sota/image-classification-on-eurosat?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/object-detection-in-aerial-images-on-fair1m-2)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-fair1m-2?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/building-change-detection-for-remote-sensing)](https://paperswithcode.com/sota/building-change-detection-for-remote-sensing?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/change-detection-on-levir-cd)](https://paperswithcode.com/sota/change-detection-on-levir-cd?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/aerial-scene-classification-on-nwpu-20-as)](https://paperswithcode.com/sota/aerial-scene-classification-on-nwpu-20-as?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/change-detection-on-oscd-3ch)](https://paperswithcode.com/sota/change-detection-on-oscd-3ch?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/semantic-segmentation-on-spacenet-1)](https://paperswithcode.com/sota/semantic-segmentation-on-spacenet-1?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/change-detection-on-whu-building-dataset)](https://paperswithcode.com/sota/change-detection-on-whu-building-dataset?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/object-detection-in-aerial-images-on-xview)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-xview?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/change-detection-for-remote-sensing-images-on)](https://paperswithcode.com/sota/change-detection-for-remote-sensing-images-on?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/change-detection-on-cdd-dataset-season-1)](https://paperswithcode.com/sota/change-detection-on-cdd-dataset-season-1?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/semantic-segmentation-on-loveda)](https://paperswithcode.com/sota/semantic-segmentation-on-loveda?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/oriented-object-detection-on-dota-2-0)](https://paperswithcode.com/sota/oriented-object-detection-on-dota-2-0?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/object-detection-in-aerial-images-on-dota-1)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dota-1?p=mtp-advancing-remote-sensing-foundation-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mtp-advancing-remote-sensing-foundation-model/oriented-object-detection-on-dota-1-0)](https://paperswithcode.com/sota/oriented-object-detection-on-dota-1-0?p=mtp-advancing-remote-sensing-foundation-model)`

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

20 Mar 2024 · Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang ·

Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.

PDF Abstract

Code

Add Remove Mark official

vitae-transformer/mtp official

Tasks

Add Remove

Aerial Scene Classification

Building change detection for remote sensing images

Change Detection

Change detection for remote sensing images

Decoder

Image Classification

Instance Segmentation

Object

object-detection

Object Detection

Object Detection In Aerial Images

Oriented Object Detection

Scene Classification

Segmentation

Self-Supervised Learning

Semantic Segmentation

Datasets

ImageNet

EuroSAT

DOTA

RESISC45 LEVIR-CD

xView

LoveDA CDD Dataset (season-varying)

Million-AID WHU Building Dataset

DOTA 2.0

Satlas OSCD

SpaceNet 1

SAMRS

Results from the Paper

Add Remove

Ranked #1 on Semantic Segmentation on SpaceNet 1 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Change detection for remote sensing images	CDD Dataset (season-varying)	MAE+MTP(ViT-B+RVSA)	F1-Score	0.9787	# 4	Compare
Change Detection	CDD Dataset (season-varying)	MAE+MTP(ViT-B+RVSA)	F1-Score	97.87	# 5	Compare
Change Detection	CDD Dataset (season-varying)	MAE+MTP(ViT-L+RVSA)	F1-Score	97.98	# 3	Compare
Change detection for remote sensing images	CDD Dataset (season-varying)	IMP+MTP(InternImage-XL)	F1-Score	0.9833	# 2	Compare
Change detection for remote sensing images	CDD Dataset (season-varying)	MAE+MTP(ViT-L+RVSA)	F1-Score	0.9798	# 3	Compare
Change Detection	CDD Dataset (season-varying)	IMP+MTP(InternImage-XL)	F1-Score	98.33	# 2	Compare
Object Detection In Aerial Images	DIOR	IMP+MTP(InternImage-XL)	AP50	78.0	# 3	Compare
Object Detection In Aerial Images	DIOR	MAE+MTP(ViT-L+RVSA)	AP50	81.1	# 1	Compare
Object Detection In Aerial Images	DIOR	MAE+MTP(ViT-B+RVSA)	AP50	79.4	# 2	Compare
Object Detection In Aerial Images	DIOR-R	MAE+MTP(ViT-L+RVSA)	mAP	74.54	# 1	Compare
Object Detection In Aerial Images	DIOR-R	IMP+MTP(InternImage-XL)	mAP	72.17	# 3	Compare
Object Detection In Aerial Images	DIOR-R	MAE+MTP(ViT-B+RVSA)	mAP	71.29	# 4	Compare
Object Detection In Aerial Images	DOTA	IMP+MTP(InternImage-XL)	mAP	80.77%	# 11	Compare
Object Detection In Aerial Images	DOTA	MAE+MTP(ViT-B+RVSA)	mAP	80.67%	# 13	Compare
Object Detection In Aerial Images	DOTA	MAE+MTP(ViT-L+RVSA)	mAP	81.66%	# 4	Compare
Oriented Object Detection	DOTA 1.0	MAE+MTP(ViT-L+RVSA)	mAP	81.66	# 4	Compare
Oriented Object Detection	DOTA 2.0	MAE+MTP(ViT-L+RVSA)	mAP	58.41	# 3	Compare
Oriented Object Detection	DOTA 2.0	IMP+MTP(InternImage-XL)	mAP	55.13	# 6	Compare
Oriented Object Detection	DOTA 2.0	MAE+MTP(ViT-B+RVSA)	mAP	56.08	# 5	Compare
Image Classification	EuroSAT	MAE+MTP(ViT-L+RVSA)	Accuracy (%)	98.78	# 6	Compare
Image Classification	EuroSAT	MAE+MTP(ViT-B+RVSA)	Accuracy (%)	98.76	# 8	Compare
Image Classification	EuroSAT	IMP+MTP(IntenImage-XL)	Accuracy (%)	99.24	# 1	Compare
Object Detection In Aerial Images	FAIR1M-2.0	MAE+MTP(ViT-B+RVSA)	mAP	51.92	# 2	Compare
Object Detection In Aerial Images	FAIR1M-2.0	IMP+MTP(InternImage-XL)	mAP	50.93	# 3	Compare
Object Detection In Aerial Images	FAIR1M-2.0	MAE+MTP(ViT-L+RVSA)	mAP	53.00	# 1	Compare
Building change detection for remote sensing images	LEVIR-CD	MAE+MTP(ViT-B+RVSA)	F1	92.22	# 4	Compare
Building change detection for remote sensing images	LEVIR-CD	IMP+MTP(InternImage-XL)	F1	92.54	# 2	Compare
Building change detection for remote sensing images	LEVIR-CD	MAE+MTP(ViT-L+RVSA)	F1	92.67	# 1	Compare
Change Detection	LEVIR-CD	IMP+MTP(InternImage-XL)	F1	92.54	# 2	Compare
Change Detection	LEVIR-CD	MAE+MTP(ViT-L+RVSA)	F1	92.67	# 1	Compare
Change Detection	LEVIR-CD	MAE+MTP(ViT-B+RVSA)	F1	92.22	# 4	Compare
Semantic Segmentation	LoveDA	MAE+MTP(ViT-B+RVSA)	Category mIoU	52.39	# 10	Compare
Semantic Segmentation	LoveDA	MAE+MTP(ViT-L+RVSA)	Category mIoU	54.17	# 2	Compare
Semantic Segmentation	LoveDA	IMP+MTP(InternImage-XL)	Category mIoU	54.17	# 2	Compare
Aerial Scene Classification	NWPU (20% as trainset)	MAE+MTP(ViT-B+RVSA)	Accuracy	95.57	# 5	Compare
Aerial Scene Classification	NWPU (20% as trainset)	MAE+MTP(ViT-L+RVSA)	Accuracy	95.88	# 2	Compare
Aerial Scene Classification	NWPU (20% as trainset)	IMP+MTP(InternImage-XL)	Accuracy	96.27	# 1	Compare
Change Detection	OSCD - 3ch	MAE+MTP(ViT-B+RVSA)	F1	53.36	# 3	Compare
Change Detection	OSCD - 3ch	MAE+MTP(ViT-L+RVSA)	F1	55.92	# 1	Compare
Change Detection	OSCD - 3ch	IMP+MTP(InternImage-XL)	F1	55.61	# 2	Compare
Semantic Segmentation	SpaceNet 1	MAE+MTP(ViT-B+RVSA)	Mean IoU	79.63	# 2	Compare
Semantic Segmentation	SpaceNet 1	MAE+MTP(ViT-L+RVSA)	Mean IoU	79.54	# 3	Compare
Semantic Segmentation	SpaceNet 1	IMP+MTP(InternImage-XL)	Mean IoU	79.16	# 4	Compare
Semantic Segmentation	SpaceNet 1	MAE+MTP(ViT-L)	Mean IoU	79.69	# 1	Compare
Change Detection	WHU Building Dataset	MAE+MTP(ViT-B+RVSA)	F1-score	0.9432	# 5	Compare
Change Detection	WHU Building Dataset	MAE+MTP(ViT-L+RVSA)	F1-score	0.9475	# 3	Compare
Change Detection	WHU Building Dataset	IMP+MTP(InternImage-XL)	F1-score	0.9559	# 1	Compare
Object Detection In Aerial Images	xView	IMP+MTP(InternImage-XL)	AP50	18.2	# 2	Compare
Object Detection In Aerial Images	xView	MAE+MTP(ViT-B+RVSA)	AP50	16.4	# 3	Compare
Object Detection In Aerial Images	xView	MAE+MTP(ViT-L+RVSA)	AP50	19.4	# 1	Compare

Methods

Add Remove

Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove