Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition

4 Apr 2017  ·  Qichang Hu, Huibing Wang, Teng Li, Chunhua Shen ·

Fine-grained car recognition aims to recognize the category information of a car, such as car make, car model, or even the year of manufacture. A number of recent studies have shown that a deep convolutional neural network (DCNN) trained on a large-scale data set can achieve impressive results at a range of generic object classification tasks. In this paper, we propose a spatially weighted pooling (SWP) strategy, which considerably improves the robustness and effectiveness of the feature representation of most dominant DCNNs. More specifically, the SWP is a novel pooling layer, which contains a predefined number of spatially weighted masks or pooling channels. The SWP pools the extracted features of DCNNs with the guidance of its learnt masks, which measures the importance of the spatial units in terms of discriminative power. As the existing methods that apply uniform grid pooling on the convolutional feature maps of DCNNs, the proposed method can extract the convolutional features and generate the pooling channels from a single DCNN. Thus minimal modification is needed in terms of implementation. Moreover, the parameters of the SWP layer can be learned in the end-to-end training process of the DCNN. By applying our method to several fine-grained car recognition data sets, we demonstrate that the proposed method can achieve better performance than recent approaches in the literature. We advance the state-of-the-art results by improving the accuracy from 92.6% to 93.1% on the Stanford Cars-196 data set and 91.2% to 97.6% on the recent CompCars data set. We have also tested the proposed method on two additional large-scale data sets with impressive results observed.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Fine-Grained Image Classification CarFlag-1532 ResNet101-swp Accuracy 96.70% # 1
Fine-Grained Image Classification CarFlag-563 ResNet101-swp Accuracy 96.42% # 1
Fine-Grained Image Classification CompCars ResNet101-swp Accuracy 97.6% # 1
Fine-Grained Image Classification Stanford Cars ResNet101-swp Accuracy 93.1% # 44

Methods