Semantic Segmentation Models

Segmentation Transformer, or SETR, is a Transformer-based segmentation model. The transformer-alone encoder treats an input image as a sequence of image patches represented by learned patch embedding, and transforms the sequence with global self-attention modeling for discriminative feature representation learning. Concretely, we first decompose an image into a grid of fixed-sized patches, forming a sequence of patches. With a linear embedding layer applied to the flattened pixel vectors of every patch, we then obtain a sequence of feature embedding vectors as the input to a transformer. Given the learned features from the encoder transformer, a decoder is then used to recover the original image resolution. Crucially there is no downsampling in spatial resolution but global context modeling at every layer of the encoder transformer.

Source: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Semantic Segmentation 2 40.00%
Object Tracking 1 20.00%
Zero Shot Segmentation 1 20.00%
Medical Image Segmentation 1 20.00%

Categories