Feature Extractors

Context Enhancement Module

Introduced by Qin et al. in ThunderNet: Towards Real-time Generic Object Detection

Context Enhancement Module (CEM) is a feature extraction module used in object detection (specifically, ThunderNet) which aims to to enlarge the receptive field. The key idea of CEM is to aggregate multi-scale local context information and global context information to generate more discriminative features. In CEM, the feature maps from three scales are merged: $C_{4}$, $C_{5}$ and $C_{glb}$. $C_{glb}$ is the global context feature vector by applying a global average pooling on $C_{5}$. We then apply a 1 × 1 convolution on each feature map to squeeze the number of channels to $\alpha \times p \times p = 245$.

Afterwards, $C_{5}$ is upsampled by 2× and $C_{glb}$ is broadcast so that the spatial dimensions of the three feature maps are equal. At last, the three generated feature maps are aggregated. By leveraging both local and global context, CEM effectively enlarges the receptive field and refines the representation ability of the thin feature map. Compared with prior FPN structures, CEM involves only two 1×1 convolutions and a fc layer.

Source: ThunderNet: Towards Real-time Generic Object Detection


Paper Code Results Date Stars


Task Papers Share
Object Detection 4 66.67%
Mixed Reality 1 16.67%
Semantic Segmentation 1 16.67%