Attention Mechanisms

Strip Pooling Network

Introduced by Hou et al. in Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Spatial pooling usually operates on a small region which limits its capability to capture long-range dependencies and focus on distant regions. To overcome this, Hou et al. proposed strip pooling, a novel pooling method capable of encoding long-range context in either horizontal or vertical spatial domains.

Strip pooling has two branches for horizontal and vertical strip pooling. The horizontal strip pooling part first pools the input feature $F \in \mathcal{R}^{C \times H \times W}$ in the horizontal direction: \begin{align} y^1 = \text{GAP}^w (X) \end{align} Then a 1D convolution with kernel size 3 is applied in $y$ to capture the relationship between different rows and channels. This is repeated $W$ times to make the output $y_v$ consistent with the input shape: \begin{align} y_h = \text{Expand}(\text{Conv1D}(y^1)) \end{align} Vertical strip pooling is performed in a similar way. Finally, the outputs of the two branches are fused using element-wise summation to produce the attention map: \begin{align} s &= \sigma(Conv^{1\times 1}(y_{v} + y_{h})) \end{align} \begin{align} Y &= s X \end{align}

The strip pooling module (SPM) is further developed in the mixed pooling module (MPM). Both consider spatial and channel relationships to overcome the locality of convolutional neural networks. SPNet achieves state-of-the-art results for several complex semantic segmentation benchmarks.

Source: Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Semantic Segmentation 2 28.57%
3D Object Detection 1 14.29%
Autonomous Driving 1 14.29%
Object Detection 1 14.29%
Retinal Vessel Segmentation 1 14.29%
Scene Parsing 1 14.29%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories