spatial transformer networks

Introduced by Jaderberg et al. in Spatial Transformer Networks

spatial transformer networks uses an explicit procedure to learn invariance to translation, scaling, rotation and other more general warps, making the network pay attention to the most relevant regions. STN was the first attention mechanism to explicitly predict important regions and provide a deep neural network with transformation invariance.

Taking a 2D image as an example, a 2D affine transformation can be formulated as followed, where A denotes a $ 2 \times 3 $ learneable affine matrix:

\begin{align} A = f_\text{loc}(U) \end{align} \begin{align} x_i^s = A x_i^t \end{align}

Here, $U$ is the input feature map, and $f_\text{loc}$ can be any differentiable function, such as a lightweight fully-connected network or convolutional neural network. $x_{i}^{s}$ is coordinates in the output feature map, while $x_{i}^{t}$ is corresponding coordinates in the input feature map and the $ A $ matrix is the learnable affine matrix. After obtaining the correspondence, the network can sample relevant input regions using the correspondence. To ensure that the whole process is differentiable and can be updated in an end-to-end manner, bilinear sampling is used to sample the input features.

STNs focus on discriminative regions automatically and learn invariance to some geometric transformations.

Source: Spatial Transformer Networks

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
General Classification	4	12.12%
Image Reconstruction	2	6.06%
Reinforcement Learning (RL)	2	6.06%
Classification	2	6.06%
Image Classification	2	6.06%
Person Re-Identification	2	6.06%
Disentanglement	1	3.03%
Pose Transfer	1	3.03%
Self-Supervised Learning	1	3.03%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Mechanisms