Fine-Grained Visual Classification with Batch Confusion Norm

We introduce a regularization concept based on the proposed Batch Confusion Norm (BCN) to address Fine-Grained Visual Classification (FGVC). The FGVC problem is notably characterized by its two intriguing properties, significant inter-class similarity and intra-class variations, which cause learning an effective FGVC classifier a challenging task. Inspired by the use of pairwise confusion energy as a regularization mechanism, we develop the BCN technique to improve the FGVC learning by imposing class prediction confusion on each training batch, and consequently alleviate the possible overfitting due to exploring image feature of fine details. In addition, our method is implemented with an attention gated CNN model, boosted by the incorporation of Atrous Spatial Pyramid Pooling (ASPP) to extract discriminative features and proper attentions. To demonstrate the usefulness of our method, we report state-of-the-art results on several benchmark FGVC datasets, along with comprehensive ablation comparisons.

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Fine-Grained Image Classification CUB-200-2011 BCN Accuracy 89.2% # 24
Fine-Grained Image Classification FGVC Aircraft BCN Accuracy 93.5% # 10
Fine-Grained Image Classification Stanford Cars BCN Accuracy 94.8% # 17