Coordinate attention

Introduced by Hou et al. in Coordinate Attention for Efficient Mobile Network Design

Hou et al. proposed coordinate attention, a novel attention mechanism which embeds positional information into channel attention, so that the network can focus on large important regions at little computational cost.

The coordinate attention mechanism has two consecutive steps, coordinate information embedding and coordinate attention generation. First, two spatial extents of pooling kernels encode each channel horizontally and vertically. In the second step, a shared $1\times 1$ convolutional transformation function is applied to the concatenated outputs of the two pooling layers. Then coordinate attention splits the resulting tensor into two separate tensors to yield attention vectors with the same number of channels for horizontal and vertical coordinates of the input $X$ along. This can be written as \begin{align} z^h &= \text{GAP}^h(X) \end{align} \begin{align} z^w &= \text{GAP}^w(X) \end{align} \begin{align} f &= \delta(\text{BN}(\text{Conv}_1^{1\times 1}([z^h;z^w]))) \end{align} \begin{align} f^h, f^w &= \text{Split}(f) \end{align} \begin{align} s^h &= \sigma(\text{Conv}_h^{1\times 1}(f^h)) \end{align} \begin{align} s^w &= \sigma(\text{Conv}_w^{1\times 1}(f^w)) \end{align} \begin{align} Y &= X s^h s^w \end{align} where $\text{GAP}^h$ and $\text{GAP}^w$ denote pooling functions for vertical and horizontal coordinates, and $s^h \in \mathbb{R}^{C\times 1\times W}$ and $s^w \in \mathbb{R}^{C\times H\times 1}$ represent corresponding attention weights.

Using coordinate attention, the network can accurately obtain the position of a targeted object. This approach has a larger receptive field than BAM and CBAM. Like an SE block, it also models cross-channel relationships, effectively enhancing the expressive power of the learned features. Due to its lightweight design and flexibility, it can be easily used in classical building blocks of mobile networks.

Source: Coordinate Attention for Efficient Mobile Network Design

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Semantic Segmentation	4	12.50%
Object Detection	3	9.38%
Image Segmentation	2	6.25%
Dimensionality Reduction	1	3.13%
Image Classification	1	3.13%
Image Generation	1	3.13%
Image Restoration	1	3.13%
Optical Flow Estimation	1	3.13%
Defect Detection	1	3.13%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Mechanisms