Attention-augmented Convolution Explained

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**Attention-augmented Convolution** is a type of [convolution](https://paperswithcode.com/method/convolution) with a two-dimensional relative self-attention mechanism that can replace convolutions as a stand-alone computational primitive for image classification. It employs [scaled-dot product attention](https://paperswithcode.com/method/scaled) and [multi-head attention](https://paperswithcode.com/method/multi-head-attention) as with [Transformers](https://paperswithcode.com/method/transformer).

It works by concatenating convolutional and attentional feature map. To see this, consider an original convolution operator with kernel size $k$, $F\_{in}$ input filters and $F\_{out}$ output filters. The corresponding attention augmented convolution can be written as"

$$\text{AAConv}\left(X\right) = \text{Concat}\left[\text{Conv}(X), \text{MHA}(X)\right] $$

$X$ originates from an input tensor of shape $\left(H, W, F\_{in}\right)$. This is flattened to become $X \in \mathbb{R}^{HW \times F\_{in}}$ which is passed into a multi-head attention module, as well as a convolution (see above).

Similarly to the convolution, the attention augmented convolution 1) is equivariant to translation and 2) can readily operate on inputs of different spatial dimensions.

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2020-06-12_at_9.00.46_PM.png Clear
Change:

Attached collections:

CONVOLUTIONS

ATTENTION MODULES

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Music Modeling	1	25.00%
General Classification	1	25.00%
Image Classification	1	25.00%
Object Detection	1	25.00%

Component	Type	Add Remove
Convolution	Convolutions
Multi-Head Attention	Attention Modules
Scaled Dot-Product Attention	Attention Mechanisms

Attention-augmented Convolution

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove