TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Aerial Scene Classification	AID (20% as trainset)	RSP-Swin-T	Accuracy	96.83	# 4
Aerial Scene Classification	AID (20% as trainset)	IMP-ViTAEv2-S	Accuracy	96.61	# 6
Aerial Scene Classification	AID (20% as trainset)	RSP-ResNet-50	Accuracy	96.81	# 5
Aerial Scene Classification	AID (20% as trainset)	RSP-ViTAEv2-S	Accuracy	96.91	# 3
Aerial Scene Classification	AID (50% as trainset)	RSP-Swin-T	Accuracy	98.30	# 3
Aerial Scene Classification	AID (50% as trainset)	IMP-ViTAEv2-S	Accuracy	98.08	# 5
Aerial Scene Classification	AID (50% as trainset)	RSP-ViTAEv2-S	Accuracy	98.22	# 4
Aerial Scene Classification	AID (50% as trainset)	RSP-ResNet-50	Accuracy	97.89	# 6
Change detection for remote sensing images	CDD Dataset (season-varying)	IMP-ViTAEv2-S-BIT	F1-Score	0.9702	# 9
Change detection for remote sensing images	CDD Dataset (season-varying)	RSP-Swin-T-BIT	F1-Score	0.9521	# 17
Change detection for remote sensing images	CDD Dataset (season-varying)	RSP-ResNet-50-BIT	F1-Score	0.96	# 14
Change detection for remote sensing images	CDD Dataset (season-varying)	RSP-ViTAEv2-S-BIT	F1-Score	0.9681	# 11
Object Detection In Aerial Images	DOTA	RSP-ResNet-50-FPN-ORCN	mAP	76.50%	# 36
Object Detection In Aerial Images	DOTA	RSP-Swin-T-FPN-ORCN	mAP	76.12%	# 38
Object Detection In Aerial Images	DOTA	IMP-ViTAEv2-S-FPN-ORCN	mAP	77.38%	# 28
Object Detection In Aerial Images	DOTA	RSP-ViTAEv2-S-FPN-ORCN	mAP	77.72%	# 25
Object Detection In Aerial Images	HRSC2016	RSP-ViTAEv2-S-FPN-ORCN	mAP-07	90.4	# 4
Object Detection In Aerial Images	HRSC2016	RSP-Swin-T-FPN-ORCN	mAP-07	90.0	# 7
Object Detection In Aerial Images	HRSC2016	RSP-ResNet-50-FPN-ORCN	mAP-07	90.3	# 6
Object Detection In Aerial Images	HRSC2016	IMP-ViTAEv2-S-FPN-ORCN	mAP-07	90.4	# 4
Semantic Segmentation	iSAID	RSP-Swin-T-UperNet	mIoU	64.1	# 16
Semantic Segmentation	iSAID	IMP-ViTAEv2-S-UperNet	mIoU	65.3	# 12
Semantic Segmentation	iSAID	RSP-ResNet-50-UperNet	mIoU	61.6	# 19
Semantic Segmentation	iSAID	RSP-ViTAEv2-S-UperNet	mIoU	64.3	# 15
Semantic Segmentation	ISPRS Potsdam	RSP-ViTAEv2-S-UperNet	Overall Accuracy	91.21	# 12
Semantic Segmentation	ISPRS Potsdam	RSP-ResNet-50-UperNet	Overall Accuracy	90.61	# 16
Semantic Segmentation	ISPRS Potsdam	RSP-Swin-T-UperNet	Overall Accuracy	90.78	# 14
Semantic Segmentation	ISPRS Potsdam	IMP-ViTAEv2-S-UperNet	Overall Accuracy	91.6	# 7
Building change detection for remote sensing images	LEVIR-CD	RSP-ViTAEv2-S-BIT	F1	90.93	# 19
Building change detection for remote sensing images	LEVIR-CD	SeCo-ResNet-50	F1	90.14	# 26
Building change detection for remote sensing images	LEVIR-CD	RSP-Swin-T	F1	90.10	# 27
Building change detection for remote sensing images	LEVIR-CD	RSP-ResNet-50	F1	90.10	# 27
Building change detection for remote sensing images	LEVIR-CD	IMP-ViTAEv2-S-BIT	F1	91.26	# 14
Aerial Scene Classification	NWPU (10% as trainset)	RSP-ResNet-50	Accuracy	93.93	# 2
Aerial Scene Classification	NWPU (10% as trainset)	IMP-ViTAEv2-S	Accuracy	93.9	# 4
Aerial Scene Classification	NWPU (10% as trainset)	RSP-ViTAEv2-S	Accuracy	94.41	# 1
Aerial Scene Classification	NWPU (10% as trainset)	RSP-Swin-T	Accuracy	93.02	# 7
Aerial Scene Classification	NWPU (20% as trainset)	RSP-ResNet-50	Accuracy	95.02	# 8
Aerial Scene Classification	NWPU (20% as trainset)	RSP-ViTAEv2-S	Accuracy	95.60	# 4
Aerial Scene Classification	NWPU (20% as trainset)	IMP-ViTAEv2-S	Accuracy	95.29	# 7
Aerial Scene Classification	NWPU (20% as trainset)	RSP-Swin-T	Accuracy	94.51	# 10
Aerial Scene Classification	UCM (80% as trainset)	RSP-ResNet-50	Accuracy	99.52	# 5
Aerial Scene Classification	UCM (80% as trainset)	RSP-Swin-T	Accuracy	99.52	# 5
Aerial Scene Classification	UCM (80% as trainset)	RSP-ViTAEv2-S	Accuracy	99.90	# 1
Aerial Scene Classification	UCM (80% as trainset)	IMP-ViTAEv2-S	Accuracy	99.71	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/aerial-scene-classification-on-nwpu-10-as)](https://paperswithcode.com/sota/aerial-scene-classification-on-nwpu-10-as?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/aerial-scene-classification-on-ucm-80-as)](https://paperswithcode.com/sota/aerial-scene-classification-on-ucm-80-as?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/aerial-scene-classification-on-aid-20-as)](https://paperswithcode.com/sota/aerial-scene-classification-on-aid-20-as?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/aerial-scene-classification-on-aid-50-as)](https://paperswithcode.com/sota/aerial-scene-classification-on-aid-50-as?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/object-detection-in-aerial-images-on-hrsc2016)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-hrsc2016?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/aerial-scene-classification-on-nwpu-20-as)](https://paperswithcode.com/sota/aerial-scene-classification-on-nwpu-20-as?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/semantic-segmentation-on-isprs-potsdam)](https://paperswithcode.com/sota/semantic-segmentation-on-isprs-potsdam?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/change-detection-for-remote-sensing-images-on)](https://paperswithcode.com/sota/change-detection-for-remote-sensing-images-on?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/semantic-segmentation-on-isaid)](https://paperswithcode.com/sota/semantic-segmentation-on-isaid?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/building-change-detection-for-remote-sensing)](https://paperswithcode.com/sota/building-change-detection-for-remote-sensing?p=an-empirical-study-of-remote-sensing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-remote-sensing/object-detection-in-aerial-images-on-dota-1)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dota-1?p=an-empirical-study-of-remote-sensing)`

An Empirical Study of Remote Sensing Pretraining

6 Apr 2022 · Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, DaCheng Tao ·

Deep learning has largely reshaped remote sensing (RS) research for aerial image understanding and made a great success. Nevertheless, most of the existing deep models are initialized with the ImageNet pretrained weights. Since natural images inevitably present a large domain gap relative to aerial images, probably limiting the finetuning performance on downstream aerial scene tasks. This issue motivates us to conduct an empirical study of remote sensing pretraining (RSP) on aerial images. To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of RSP on representative downstream tasks including scene recognition, semantic segmentation, object detection, and change detection using these CNN and vision transformer backbones. Empirical study shows that RSP can help deliver distinctive performances in scene recognition tasks and in perceiving RS related semantics such as "Bridge" and "Airplane". We also find that, although RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, it may still suffer from task discrepancies, where downstream tasks require different representations from scene recognition tasks. These findings call for further research efforts on both large-scale pretraining datasets and effective pretraining methods. The codes and pretrained models will be released at https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing.

PDF Abstract

Code

Add Remove Mark official

vitae-transformer/vitae-transformer… official

414

vitae-transformer/rsp official

119

Tasks

Add Remove

Aerial Scene Classification

Building change detection for remote sensing images

Change Detection

Change detection for remote sensing images

Object Detection

Object Detection In Aerial Images

Scene Recognition

Semantic Segmentation

Datasets

ImageNet

DOTA LEVIR-CD

iSAID

AID CDD Dataset (season-varying)

Million-AID

ISPRS Potsdam UC Merced Land Use Dataset

HRSC2016

Results from the Paper

Edit

Ranked #1 on Aerial Scene Classification on UCM (80% as trainset)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Aerial Scene Classification	AID (20% as trainset)	RSP-Swin-T	Accuracy	96.83	# 4	Compare
Aerial Scene Classification	AID (20% as trainset)	IMP-ViTAEv2-S	Accuracy	96.61	# 6	Compare
Aerial Scene Classification	AID (20% as trainset)	RSP-ResNet-50	Accuracy	96.81	# 5	Compare
Aerial Scene Classification	AID (20% as trainset)	RSP-ViTAEv2-S	Accuracy	96.91	# 3	Compare
Aerial Scene Classification	AID (50% as trainset)	RSP-Swin-T	Accuracy	98.30	# 3	Compare
Aerial Scene Classification	AID (50% as trainset)	IMP-ViTAEv2-S	Accuracy	98.08	# 5	Compare
Aerial Scene Classification	AID (50% as trainset)	RSP-ViTAEv2-S	Accuracy	98.22	# 4	Compare
Aerial Scene Classification	AID (50% as trainset)	RSP-ResNet-50	Accuracy	97.89	# 6	Compare
Change detection for remote sensing images	CDD Dataset (season-varying)	IMP-ViTAEv2-S-BIT	F1-Score	0.9702	# 9	Compare
Change detection for remote sensing images	CDD Dataset (season-varying)	RSP-Swin-T-BIT	F1-Score	0.9521	# 17	Compare
Change detection for remote sensing images	CDD Dataset (season-varying)	RSP-ResNet-50-BIT	F1-Score	0.96	# 14	Compare
Change detection for remote sensing images	CDD Dataset (season-varying)	RSP-ViTAEv2-S-BIT	F1-Score	0.9681	# 11	Compare
Object Detection In Aerial Images	DOTA	RSP-ResNet-50-FPN-ORCN	mAP	76.50%	# 36	Compare
Object Detection In Aerial Images	DOTA	RSP-Swin-T-FPN-ORCN	mAP	76.12%	# 38	Compare
Object Detection In Aerial Images	DOTA	IMP-ViTAEv2-S-FPN-ORCN	mAP	77.38%	# 28	Compare
Object Detection In Aerial Images	DOTA	RSP-ViTAEv2-S-FPN-ORCN	mAP	77.72%	# 25	Compare
Object Detection In Aerial Images	HRSC2016	RSP-ViTAEv2-S-FPN-ORCN	mAP-07	90.4	# 4	Compare
Object Detection In Aerial Images	HRSC2016	RSP-Swin-T-FPN-ORCN	mAP-07	90.0	# 7	Compare
Object Detection In Aerial Images	HRSC2016	RSP-ResNet-50-FPN-ORCN	mAP-07	90.3	# 6	Compare
Object Detection In Aerial Images	HRSC2016	IMP-ViTAEv2-S-FPN-ORCN	mAP-07	90.4	# 4	Compare
Semantic Segmentation	iSAID	RSP-Swin-T-UperNet	mIoU	64.1	# 16	Compare
Semantic Segmentation	iSAID	IMP-ViTAEv2-S-UperNet	mIoU	65.3	# 12	Compare
Semantic Segmentation	iSAID	RSP-ResNet-50-UperNet	mIoU	61.6	# 19	Compare
Semantic Segmentation	iSAID	RSP-ViTAEv2-S-UperNet	mIoU	64.3	# 15	Compare
Semantic Segmentation	ISPRS Potsdam	RSP-ViTAEv2-S-UperNet	Overall Accuracy	91.21	# 12	Compare
Semantic Segmentation	ISPRS Potsdam	RSP-ResNet-50-UperNet	Overall Accuracy	90.61	# 16	Compare
Semantic Segmentation	ISPRS Potsdam	RSP-Swin-T-UperNet	Overall Accuracy	90.78	# 14	Compare
Semantic Segmentation	ISPRS Potsdam	IMP-ViTAEv2-S-UperNet	Overall Accuracy	91.6	# 7	Compare
Building change detection for remote sensing images	LEVIR-CD	RSP-ViTAEv2-S-BIT	F1	90.93	# 19	Compare
Building change detection for remote sensing images	LEVIR-CD	SeCo-ResNet-50	F1	90.14	# 26	Compare
Building change detection for remote sensing images	LEVIR-CD	RSP-Swin-T	F1	90.10	# 27	Compare
Building change detection for remote sensing images	LEVIR-CD	RSP-ResNet-50	F1	90.10	# 27	Compare
Building change detection for remote sensing images	LEVIR-CD	IMP-ViTAEv2-S-BIT	F1	91.26	# 14	Compare
Aerial Scene Classification	NWPU (10% as trainset)	RSP-ResNet-50	Accuracy	93.93	# 2	Compare
Aerial Scene Classification	NWPU (10% as trainset)	IMP-ViTAEv2-S	Accuracy	93.9	# 4	Compare
Aerial Scene Classification	NWPU (10% as trainset)	RSP-ViTAEv2-S	Accuracy	94.41	# 1	Compare
Aerial Scene Classification	NWPU (10% as trainset)	RSP-Swin-T	Accuracy	93.02	# 7	Compare
Aerial Scene Classification	NWPU (20% as trainset)	RSP-ResNet-50	Accuracy	95.02	# 8	Compare
Aerial Scene Classification	NWPU (20% as trainset)	RSP-ViTAEv2-S	Accuracy	95.60	# 4	Compare
Aerial Scene Classification	NWPU (20% as trainset)	IMP-ViTAEv2-S	Accuracy	95.29	# 7	Compare
Aerial Scene Classification	NWPU (20% as trainset)	RSP-Swin-T	Accuracy	94.51	# 10	Compare
Aerial Scene Classification	UCM (80% as trainset)	RSP-ResNet-50	Accuracy	99.52	# 5	Compare
Aerial Scene Classification	UCM (80% as trainset)	RSP-Swin-T	Accuracy	99.52	# 5	Compare
Aerial Scene Classification	UCM (80% as trainset)	RSP-ViTAEv2-S	Accuracy	99.90	# 1	Compare
Aerial Scene Classification	UCM (80% as trainset)	IMP-ViTAEv2-S	Accuracy	99.71	# 4	Compare

Methods

Add Remove

Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

An Empirical Study of Remote Sensing Pretraining

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove