Re-Attention Module

Introduced by Zhou et al. in DeepViT: Towards Deeper Vision Transformer

The Re-Attention Module is an attention layer used in the DeepViT architecture which mixes the attention map with a learnable matrix before multiplying with the values. The motivation is to re-generate the attention maps to increase their diversity at different layers with negligible computation and memory cost. The authors note that traditional self-attention fails to learn effective concepts for representation learning in deeper layers of ViT -- attention maps become more similar and less diverse in deeper layers (attention collapse) - and this hinders the model from getting expected performance gain. Re-attention is implemented by:

$$ \operatorname{Re}-\operatorname{Attention}(Q, K, V)=\operatorname{Norm}\left(\Theta^{\top}\left(\operatorname{Softmax}\left(\frac{Q K^{\top}}{\sqrt{d}}\right)\right)\right) V $$

where transformation matrix $\Theta$ is multiplied to the self-attention map $\textbf{A}$ along the head dimension.

Source: DeepViT: Towards Deeper Vision Transformer

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Visual Commonsense Reasoning	1	20.00%
Visual Reasoning	1	20.00%
Point Cloud Segmentation	1	20.00%
Semantic Segmentation	1	20.00%
Image Classification	1	20.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Modules