Convolutions

Attention-augmented Convolution

Introduced by Bello et al. in Attention Augmented Convolutional Networks

Attention-augmented Convolution is a type of convolution with a two-dimensional relative self-attention mechanism that can replace convolutions as a stand-alone computational primitive for image classification. It employs scaled-dot product attention and multi-head attention as with Transformers.

It works by concatenating convolutional and attentional feature map. To see this, consider an original convolution operator with kernel size $k$, $F_{in}$ input filters and $F_{out}$ output filters. The corresponding attention augmented convolution can be written as"

$$\text{AAConv}\left(X\right) = \text{Concat}\left[\text{Conv}(X), \text{MHA}(X)\right] $$

$X$ originates from an input tensor of shape $\left(H, W, F_{in}\right)$. This is flattened to become $X \in \mathbb{R}^{HW \times F_{in}}$ which is passed into a multi-head attention module, as well as a convolution (see above).

Similarly to the convolution, the attention augmented convolution 1) is equivariant to translation and 2) can readily operate on inputs of different spatial dimensions.

Source: Attention Augmented Convolutional Networks

Papers


Paper Code Results Date Stars

Tasks


Categories