Dynamic Convolution: Attention over Convolution Kernels

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification ImageNet DY-MobileNetV2 ×1.0 Top 1 Accuracy 74.4% # 958
Number of params 11.1M # 496
GFLOPs 0,626 # 548
Image Classification ImageNet DY-MobileNetV3-Small Top 1 Accuracy 69.7% # 1004
Number of params 4.8M # 404
GFLOPs 0.137 # 8
Image Classification ImageNet DY-MobileNetV2 ×0.5 Top 1 Accuracy 69.4% # 1006
Number of params 4M # 388
GFLOPs 0.203 # 13
Image Classification ImageNet DY-MobileNetV2 ×0.35 Top 1 Accuracy 64.9% # 1024
Number of params 2.8M # 373
GFLOPs 0.124 # 5
Image Classification ImageNet DY-MobileNetV2 ×0.75 Top 1 Accuracy 72.8% # 971
Number of params 7M # 463
GFLOPs 0.435 # 48
Image Classification ImageNet DY-ResNet-10 Top 1 Accuracy 67.7% # 1015
Number of params 18.6M # 537
GFLOPs 1.82 # 143
Image Classification ImageNet DY-ResNet-18 Top 1 Accuracy 72.7% # 973
Number of params 42.7M # 711
GFLOPs 3.7 # 190

Methods