Attention-augmented Convolution is a type of convolution with a two-dimensional relative self-attention mechanism that can replace convolutions as a stand-alone computational primitive for image classification. It employs scaled-dot product attention and multi-head attention as with Transformers.
It works by concatenating convolutional and attentional feature map. To see this, consider an original convolution operator with kernel size $k$, $F_{in}$ input filters and $F_{out}$ output filters. The corresponding attention augmented convolution can be written as"
$$\text{AAConv}\left(X\right) = \text{Concat}\left[\text{Conv}(X), \text{MHA}(X)\right] $$
$X$ originates from an input tensor of shape $\left(H, W, F_{in}\right)$. This is flattened to become $X \in \mathbb{R}^{HW \times F_{in}}$ which is passed into a multi-head attention module, as well as a convolution (see above).
Similarly to the convolution, the attention augmented convolution 1) is equivariant to translation and 2) can readily operate on inputs of different spatial dimensions.
Source: Attention Augmented Convolutional NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
3D Point Cloud Classification | 1 | 16.67% |
Point Cloud Classification | 1 | 16.67% |
Music Modeling | 1 | 16.67% |
General Classification | 1 | 16.67% |
Image Classification | 1 | 16.67% |
Object Detection | 1 | 16.67% |
Component | Type |
|
---|---|---|
Convolution
|
Convolutions | |
Multi-Head Attention
|
Attention Modules | |
Scaled Dot-Product Attention
|
Attention Mechanisms |