Attention Modules

Channel-wise Cross Attention is a module for semantic segmentation used in the UCTransNet architecture. It is used to fuse features of inconsistent semantics between the Channel Transformer and U-Net decoder. It guides the channel and information filtration of the Transformer features and eliminates the ambiguity with the decoder features.

Mathematically, we take the $i$-th level Transformer output $\mathbf{O_{i}} \in \mathbb{R}^{C×H×W}$ and i-th level decoder feature map $\mathbf{D_{i}} \in \mathbb{R}^{C×H×W}$ as the inputs of Channel-wise Cross Attention. Spatial squeeze is performed by a global average pooling (GAP) layer, producing vector $\mathcal{G}\left(\mathbf{X}\right) \in \mathbb{R}^{C×1×1}$ with its $k$th channel $\mathcal{G}\left(\mathbf{X}\right) = \frac{1}{H×W}\sum^{H}_{i=1}\sum^{W}_{j=1}\mathbf{X}^{k}\left(i, j\right)$. We use this operation to embed the global spatial information and then generate the attention mask:

$$ \mathbf{M}_{i} = \mathbf{L}_{1} \cdot \mathcal{G}\left(\mathbf{O_{i}}\right) + \mathbf{L}_{2} \cdot \mathcal{G}\left(\mathbf{D}_{i}\right) $$

where $\mathbf{L}_{1} \in \mathbb{R}^{C×C}$ and $\mathbf{L}_{2} \in \mathbb{R}^{C×C}$ and being weights of two Linear layers and the ReLU operator $\delta\left(\cdot\right)$. This operation in the equation above encodes the channel-wise dependencies. Following ECA-Net which empirically showed avoiding dimensionality reduction is important for learning channel attention, the authors use a single Linear layer and sigmoid function to build the channel attention map. The resultant vector is used to recalibrate or excite $\mathbf{O_{i}}$ to $\mathbf{\bar{O}_{i}} = \sigma\left(\mathbf{M_{i}}\right) \cdot \mathbf{O_{i}}$, where the activation $\sigma\left(\mathbf{M_{i}}\right)$ indicates the importance of each channel. Finally, the masked $\mathbf{\bar{O}}_{i}$ is concatenated with the up-sampled features of the $i$-th level decoder.

Source: UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Image Segmentation 3 25.00%
Medical Image Segmentation 3 25.00%
Semantic Segmentation 3 25.00%
Pseudo Label 1 8.33%
text annotation 1 8.33%
UNET Segmentation 1 8.33%

Categories