SFAM, or Scalewise Feature Aggregation Module, is a feature extraction block from the M2Det architecture. It aims to aggregate the multilevel multiscale features generated by Thinned UShaped Modules into a multilevel feature pyramid.
The first stage of SFAM is to concatenate features of the equivalent scale together along the channel dimension. The aggregated feature pyramid can be presented as $\mathbf{X} =[\mathbf{X}_1,\mathbf{X}_2,\dots,\mathbf{X}_i]$, where $\mathbf{X}_i = \text{Concat}(\mathbf{x}_i^1,\mathbf{x}_i^2,\dots,\mathbf{x}_i^L) \in \mathbb{R}^{W_{i}\times H_{i}\times C}$ refers to the features of the $i$th largest scale. Here, each scale in the aggregated pyramid contains features from multilevel depths.
However, simple concatenation operations are not adaptive enough. In the second stage, we introduce a channelwise attention module to encourage features to focus on channels that they benefit most. Following SqueezeandExcitation, we use global average pooling to generate channelwise statistics $\mathbf{z} \in \mathbb{R}^C$ at the squeeze step. And to fully capture channelwise dependencies, the following excitation step learns the attention mechanism via two fully connected layers:
$$ \mathbf{s} = \mathbf{F}_{ex}(\mathbf{z},\mathbf{W}) = \sigma(\mathbf{W}_{2} \delta(\mathbf{W}_{1}\mathbf{z})), $$
where $\sigma$ refers to the ReLU function, $\delta$ refers to the sigmoid function, $\mathbf{W}_{1} \in \mathbb{R}^{\frac{C}{r}\times C}$ , $\mathbf{W}_{2} \in \mathbb{R}^{C\times \frac{C}{r}}$, r is the reduction ratio ($r=16$ in our experiments). The final output is obtained by reweighting the input $\mathbf{X}$ with activation $\mathbf{s}$:
$$ \tilde{\mathbf{X}}_i^c = \mathbf{F}_{scale}(\mathbf{X}_i^c,s_c) = s_c \cdot \mathbf{X}_i^c, $$
where $\tilde{\mathbf{X}_i} = [\tilde{\mathbf{X}}_i^1,\tilde{\mathbf{X}}_i^2,...,\tilde{\mathbf{X}}_i^C]$, each of the features is enhanced or weakened by the rescaling operation.
Source: M2Det: A SingleShot Object Detector based on MultiLevel Feature Pyramid NetworkPaper  Code  Results  Date  Stars 

Task  Papers  Share 

3D Reconstruction  1  14.29% 
Motion Capture  1  14.29% 
Structure from Motion  1  14.29% 
Anomaly Detection  1  14.29% 
Video Prediction  1  14.29% 
Object Classification  1  14.29% 
Object Detection  1  14.29% 
Component  Type 


Dense Connections

Feedforward Networks  
Global Average Pooling

Pooling Operations  
ReLU

Activation Functions  
Sigmoid Activation

Activation Functions 