Global-and-Local attention

Introduced by Linsley et al. in Learning what and where to attend

Most attention mechanisms learn where to focus using only weak supervisory signals from class labels, which inspired Linsley et al. to investigate how explicit human supervision can affect the performance and interpretability of attention models. As a proof of concept, Linsley et al. proposed the global-and-local attention (GALA) module, which extends an SE block with a spatial attention mechanism.

Given the input feature map $X$, GALA uses an attention mask that combines global and local attention to tell the network where and on what to focus. As in SE blocks, global attention aggregates global information by global average pooling and then produces a channel-wise attention weight vector using a multilayer perceptron. In local attention, two consecutive $1\times 1$ convolutions are conducted on the input to produce a positional weight map. The outputs of the local and global pathways are combined by addition and multiplication. Formally, GALA can be represented as: \begin{align} s_g &= W_{2} \delta (W_{1}\text{GAP}(x)) \end{align}

\begin{align} s_l &= Conv_2^{1\times 1} (\delta(Conv_1^{1\times1}(X))) \end{align}

\begin{align} s_g^* &= \text{Expand}(s_g) \end{align}

\begin{align} s_l^* &= \text{Expand}(s_l) \end{align}

\begin{align} s &= \tanh(a(s_g^* + s_l^*) +m \cdot (s_g^* s_l^*) ) \end{align}

\begin{align} Y &= sX \end{align}

where $a,m \in \mathbb{R}^{C}$ are learnable parameters representing channel-wise weight vectors.

Supervised by human-provided feature importance maps, GALA has significantly improved representational power and can be combined with any CNN backbone.

Source: Learning what and where to attend

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
General Knowledge	1	25.00%
Reinforcement Learning (RL)	1	25.00%
Image Categorization	1	25.00%
Object Recognition	1	25.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Mechanisms