TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Semantic Segmentation	INRIA Aerial Image Labeling	DSAT-Net	IoU	82.68	# 5
Extracting Buildings In Remote Sensing Images	Massachusetts building dataset	DSAT-Net	IoU	76.54	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dsat-net-dual-spatial-attention-transformer/extracting-buildings-in-remote-sensing-images-4)](https://paperswithcode.com/sota/extracting-buildings-in-remote-sensing-images-4?p=dsat-net-dual-spatial-attention-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dsat-net-dual-spatial-attention-transformer/semantic-segmentation-on-inria-aerial-image)](https://paperswithcode.com/sota/semantic-segmentation-on-inria-aerial-image?p=dsat-net-dual-spatial-attention-transformer)`

DSAT-Net: Dual Spatial Attention Transformer for Building Extraction from Aerial Images

IEEE Geoscience and Remote Sensing Letters 2023 · Renhe Zhang, Zhechun Wan, Qian Zhang, Guixu Zhang ·

Both local and global context dependencies are essential for building extraction from remote sensing (RS) images. Convolutional Neural Network (CNN) can extract local spatial details well but lacks the ability to model long-range dependency. In recent years, Vision Transformer (ViT) have shown great potential in modeling global context dependency. However, it usually brings huge computational cost, and spatial details can not be fully retained in the process of feature extraction. To maximize the advantages of CNNs and ViTs, we propose DSAT-Net, which combine them in one model. In DSAT-Net, we design an efficient Dual Spatial Attention Transformer (DSAFormer) to solve the defects of standard ViT. It has a dual attention structure to complement each other. Specifically, the global attention path (GAP) conducts a large scale down sampling of the feature maps before the global self-attention computing, to reduce the computational cost. The local attention path (LAP) uses efficient stripe convolution to generate local attention, which can alleviate the loss of information caused by down-sampling operation in the GAP and supplement the spatial details. In addition, we design a feature refining module called Channel Mixing Feature Refine Module (CM-FRM) to fuse low-level and high-level features. Our model achieved competitive results on three public building extraction datasets. Code will be available at: https://github.com/stdcoutzrh/BuildingExtraction.

PDF

Code

Add Remove Mark official

stdcoutzrh/BuildingExtraction

Tasks

Add Remove

Extracting Buildings In Remote Sensing Images

Semantic Segmentation

Datasets

INRIA Aerial Image Labeling

Massachusetts building dataset

Results from the Paper

Add Remove

Ranked #2 on Extracting Buildings In Remote Sensing Images on Massachusetts building dataset

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	INRIA Aerial Image Labeling	DSAT-Net	IoU	82.68	# 5	Compare
Extracting Buildings In Remote Sensing Images	Massachusetts building dataset	DSAT-Net	IoU	76.54	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

DSAT-Net: Dual Spatial Attention Transformer for Building Extraction from Aerial Images

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove