Feature Extractors

# Spatial Attention Module (ThunderNet)

Introduced by Qin et al. in ThunderNet: Towards Real-time Generic Object Detection

Spatial Attention Module (SAM) is a feature extraction module for object detection used in ThunderNet.

The ThunderNet SAM explicitly re-weights the feature map before RoI warping over the spatial dimensions. The key idea of SAM is to use the knowledge from RPN to refine the feature distribution of the feature map. RPN is trained to recognize foreground regions under the supervision of ground truths. Therefore, the intermediate features in RPN can be used to distinguish foreground features from background features. SAM accepts two inputs: the intermediate feature map from RPN $\mathcal{F}^{RPN}$ and the thin feature map from the Context Enhancement Module $\mathcal{F}^{CEM}$. The output of SAM $\mathcal{F}^{SAM}$ is defined as:

$$\mathcal{F}^{SAM} = \mathcal{F}^{CEM} * \text{sigmoid}\left(\theta\left(\mathcal{F}^{RPN}\right)\right)$$

Here $\theta\left(·\right)$ is a dimension transformation to match the number of channels in both feature maps. The sigmoid function is used to constrain the values within $\left[0, 1\right]$. At last, $\mathcal{F}^{CEM}$ is re-weighted by the generated feature map for better feature distribution. For computational efficiency, we simply apply a 1×1 convolution as $\theta\left(·\right)$, so the computational cost of CEM is negligible. The Figure to the right shows the structure of SAM.

SAM has two functions. The first one is to refine the feature distribution by strengthening foreground features and suppressing background features. The second one is to stabilize the training of RPN as SAM enables extra gradient flow from R-CNN subnet to RPN. As a result, RPN receives additional supervision from RCNN subnet, which helps the training of RPN.

#### Papers

Paper Code Results Date Stars