Image Model Blocks

Spatial Transformer

Introduced by Jaderberg et al. in Spatial Transformer Networks

A Spatial Transformer is an image model block that explicitly allows the spatial manipulation of data within a convolutional neural network. It gives CNNs the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. Unlike pooling layers, where the receptive fields are fixed and local, the spatial transformer module is a dynamic mechanism that can actively spatially transform an image (or a feature map) by producing an appropriate transformation for each input sample. The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, as well as non-rigid deformations.

The architecture is shown in the Figure to the right. The input feature map $U$ is passed to a localisation network which regresses the transformation parameters $\theta$. The regular spatial grid $G$ over $V$ is transformed to the sampling grid $T_{\theta}\left(G\right)$, which is applied to $U$, producing the warped output feature map $V$. The combination of the localisation network and sampling mechanism defines a spatial transformer.

Source: Spatial Transformer Networks


Paper Code Results Date Stars


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign