Attention Mechanisms

Dynamic Convolution

Introduced by Chen et al. in Dynamic Convolution: Attention over Convolution Kernels

The extremely low computational cost of lightweight CNNs constrains the depth and width of the networks, further decreasing their representational power. To address the above problem, Chen et al. proposed dynamic convolution, a novel operator design that increases representational power with negligible additional computational cost and does not change the width or depth of the network in parallel with CondConv.

Dynamic convolution uses $K$ parallel convolution kernels of the same size and input/output dimensions instead of one kernel per layer. Like SE blocks, it adopts a squeeze-and-excitation mechanism to generate the attention weights for the different convolution kernels. These kernels are then aggregated dynamically by weighted summation and applied to the input feature map $X$: \begin{align} s & = \text{softmax} (W_{2} \delta (W_{1}\text{GAP}(X))) \end{align} \begin{align} \text{DyConv} &= \sum_{i=1}^{K} s_k \text{Conv}_k \end{align} \begin{align} Y &= \text{DyConv}(X) \end{align} Here the convolutions are combined by summation of weights and biases of convolutional kernels.

Compared to applying convolution to the feature map, the computational cost of squeeze-and-excitation and weighted summation is extremely low. Dynamic convolution thus provides an efficient operation to improve representational power and can be easily used as a replacement for any convolution.

Papers

Paper Code Results Date Stars