Attention Mechanisms

Global-and-Local attention

Introduced by Linsley et al. in Learning what and where to attend

Most attention mechanisms learn where to focus using only weak supervisory signals from class labels, which inspired Linsley et al. to investigate how explicit human supervision can affect the performance and interpretability of attention models. As a proof of concept, Linsley et al. proposed the global-and-local attention (GALA) module, which extends an SE block with a spatial attention mechanism.

Given the input feature map $X$, GALA uses an attention mask that combines global and local attention to tell the network where and on what to focus. As in SE blocks, global attention aggregates global information by global average pooling and then produces a channel-wise attention weight vector using a multilayer perceptron. In local attention, two consecutive $1\times 1$ convolutions are conducted on the input to produce a positional weight map. The outputs of the local and global pathways are combined by addition and multiplication. Formally, GALA can be represented as: \begin{align} s_g &= W_{2} \delta (W_{1}\text{GAP}(x)) \end{align}

\begin{align} s_l &= Conv_2^{1\times 1} (\delta(Conv_1^{1\times1}(X))) \end{align}

\begin{align} s_g^* &= \text{Expand}(s_g) \end{align}

\begin{align} s_l^* &= \text{Expand}(s_l) \end{align}

\begin{align} s &= \tanh(a(s_g^* + s_l^*) +m \cdot (s_g^* s_l^*) ) \end{align}

\begin{align} Y &= sX \end{align}

where $a,m \in \mathbb{R}^{C}$ are learnable parameters representing channel-wise weight vectors.

Supervised by human-provided feature importance maps, GALA has significantly improved representational power and can be combined with any CNN backbone.

Source: Learning what and where to attend

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
General Knowledge 1 25.00%
Reinforcement Learning (RL) 1 25.00%
Image Categorization 1 25.00%
Object Recognition 1 25.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories