Attention Mechanisms

Dual Attention Network

Introduced by Fu et al. in Dual Attention Network for Scene Segmentation

In the field of scene segmentation, encoder-decoder structures cannot make use of the global relationships between objects, whereas RNN-based structures heavily rely on the output of the long-term memorization. To address the above problems, Fu et al. proposed a novel framework, the dual attention network (DANet), for natural scene image segmentation. Unlike CBAM and BAM, it adopts a self-attention mechanism instead of simply stacking convolutions to compute the spatial attention map, which enables the network to capture global information directly.

DANet uses in parallel a position attention module and a channel attention module to capture feature dependencies in spatial and channel domains. Given the input feature map $X$, convolution layers are applied first in the position attention module to obtain new feature maps. Then the position attention module selectively aggregates the features at each position using a weighted sum of features at all positions, where the weights are determined by feature similarity between corresponding pairs of positions. The channel attention module has a similar form except for dimensional reduction to model cross-channel relations. Finally the outputs from the two branches are fused to obtain final feature representations. For simplicity, we reshape the feature map $X$ to $C\times (H \times W)$ whereupon the overall process can be written as \begin{align} Q,\quad K,\quad V &= W_qX,\quad W_kX,\quad W_vX \end{align} \begin{align} Y^\text{pos} &= X+ V\text{Softmax}(Q^TK) \end{align} \begin{align} Y^\text{chn} &= X+ \text{Softmax}(XX^T)X \end{align} \begin{align} Y &= Y^\text{pos} + Y^\text{chn} \end{align} where $W_q$, $W_k$, $W_v \in \mathbb{R}^{C\times C}$ are used to generate new feature maps.

The position attention module enables DANet to capture long-range contextual information and adaptively integrate similar features at any scale from a global viewpoint, while the channel attention module is responsible for enhancing useful channels as well as suppressing noise. Taking spatial and channel relationships into consideration explicitly improves the feature representation for scene segmentation. However, it is computationally costly, especially for large input feature maps.

Source: Dual Attention Network for Scene Segmentation

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Semantic Segmentation 2 15.38%
Object Tracking 1 7.69%
Visual Object Tracking 1 7.69%
Speech Separation 1 7.69%
Depth Estimation 1 7.69%
Monocular Depth Estimation 1 7.69%
Denoising 1 7.69%
Image Denoising 1 7.69%
Management 1 7.69%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories