Learning Deep Features for Discriminative Localization

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Weakly-Supervised Object Localization ILSVRC 2015 AlexNet-GAP Top-1 Error Rate 67.19 # 2
Weakly-Supervised Object Localization ILSVRC 2016 AlexNet-GAP Top-5 Error 52.16 # 4
Weakly-Supervised Object Localization ILSVRC 2016 VGGnet-GAP Top-5 Error 45.14 # 3
Weakly-Supervised Object Localization Tiny ImageNet CAM Top-1 Localization Accuracy 40.55 # 2