Attention Modules

Spatial-Reduction Attention, or SRA, is a multi-head attention module used in the Pyramid Vision Transformer architecture which reduces the spatial scale of the key $K$ and value $V$ before the attention operation. This reduces the computational/memory overhead. Details of the SRA in the stage $i$ can be formulated as follows:

$$ \text{SRA}(Q, K, V)=\text { Concat }\left(\operatorname{head}_{0}, \ldots \text { head }_{N_{i}}\right) W^{O} $$

$$\text{ head}_{j}=\text { Attention }\left(Q W_{j}^{Q}, \operatorname{SR}(K) W_{j}^{K}, \operatorname{SR}(V) W_{j}^{V}\right) $$

where Concat $(\cdot)$ is the concatenation operation. $W_{j}^{Q} \in \mathbb{R}^{C_{i} \times d_{\text {head }}}$, $W_{j}^{K} \in \mathbb{R}^{C_{i} \times d_{\text {head }}}$, $W_{j}^{V} \in \mathbb{R}^{C_{i} \times d_{\text {head }}}$, and $W^{O} \in \mathbb{R}^{C_{i} \times C_{i}}$ are linear projection parameters. $N_{i}$ is the head number of the attention layer in Stage $i$. Therefore, the dimension of each head (i.e. $\left.d_{\text {head }}\right)$ is equal to $\frac{C_{i}}{N_{i}} . \text{SR}(\cdot)$ is the operation for reducing the spatial dimension of the input sequence ($K$ or $V$ ), which is written as:

$$ \text{SR}(\mathbf{x})=\text{Norm}\left(\operatorname{Reshape}\left(\mathbf{x}, R_{i}\right) W^{S}\right) $$

Here, $\mathbf{x} \in \mathbb{R}^{\left(H_{i} W_{i}\right) \times C_{i}}$ represents a input sequence, and $R_{i}$ denotes the reduction ratio of the attention layers in Stage $i .$ Reshape $\left(\mathbf{x}, R_{i}\right)$ is an operation of reshaping the input sequence $\mathbf{x}$ to a sequence of size $\frac{H_{i} W_{i}}{R_{i}^{2}} \times\left(R_{i}^{2} C_{i}\right)$. $W_{S} \in \mathbb{R}^{\left(R_{i}^{2} C_{i}\right) \times C_{i}}$ is a linear projection that reduces the dimension of the input sequence to $C_{i}$. $\text{Norm}(\cdot)$ refers to layer normalization.

Source: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Papers


Paper Code Results Date Stars

Categories