Segmentation Transformer

Introduced by Zheng et al. in Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer, or SETR, is a Transformer-based segmentation model. The transformer-alone encoder treats an input image as a sequence of image patches represented by learned patch embedding, and transforms the sequence with global self-attention modeling for discriminative feature representation learning. Concretely, we first decompose an image into a grid of fixed-sized patches, forming a sequence of patches. With a linear embedding layer applied to the flattened pixel vectors of every patch, we then obtain a sequence of feature embedding vectors as the input to a transformer. Given the learned features from the encoder transformer, a decoder is then used to recover the original image resolution. Crucially there is no downsampling in spatial resolution but global context modeling at every layer of the encoder transformer.

Source: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Semantic Segmentation	2	40.00%
Object Tracking	1	20.00%
Zero Shot Segmentation	1	20.00%
Medical Image Segmentation	1	20.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Position-Wise Feed-Forward Layer	Feedforward Networks
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

Semantic Segmentation Models