LIP: Local Importance-based Pooling

ICCV 2019  ·  Ziteng Gao, Li-Min Wang, Gangshan Wu ·

Spatial downsampling layers are favored in convolutional neural networks (CNNs) to downscale feature maps for larger receptive fields and less memory consumption. However, for discriminative tasks, there is a possibility that these layers lose the discriminative details due to improper pooling strategies, which could hinder the learning process and eventually result in suboptimal models. In this paper, we present a unified framework over the existing downsampling layers (e.g., average pooling, max pooling, and strided convolution) from a local importance view. In this framework, we analyze the issues of these widely-used pooling layers and figure out the criteria for designing an effective downsampling layer. According to this analysis, we propose a conceptually simple, general, and effective pooling layer based on local importance modeling, termed as {\em Local Importance-based Pooling} (LIP). LIP can automatically enhance discriminative features during the downsampling procedure by learning adaptive importance weights based on inputs. Experiment results show that LIP consistently yields notable gains with different depths and different architectures on ImageNet classification. In the challenging MS COCO dataset, detectors with our LIP-ResNets as backbones obtain a consistent improvement ($\ge 1.4\%$) over the vanilla ResNets, and especially achieve the current state-of-the-art performance in detecting small objects under the single-scale testing scheme.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Datasets


Results from the Paper


Ranked #147 on Object Detection on COCO test-dev (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Object Detection COCO minival Faster R-CNN (LIP-ResNet-101) box AP 41.7 # 159
AP50 63.6 # 59
AP75 45.6 # 70
APS 25.2 # 58
APM 45.8 # 59
Object Detection COCO test-dev Faster R-CNN (LIP-ResNet-101-MD w FPN) box mAP 43.9 # 147
AP50 65.7 # 72
AP75 48.1 # 94
APS 25.4 # 92
APM 46.7 # 91
APL 56.3 # 88
Hardware Burden None # 1
Operations per network pass None # 1
Image Classification ImageNet LIP-ResNet-101 Top 1 Accuracy 79.33% # 751
Number of params 42.9M # 709
Image Classification ImageNet LIP-DenseNet-BC-121 Top 1 Accuracy 76.64% # 886
Number of params 8.7M # 473
Image Classification ImageNet ResNet-50 (LIP Bottleneck-256) Top 1 Accuracy 78.15% # 830
Number of params 25.8M # 617

Methods