Are These Birds Similar: Learning Branched Networks for Fine-grained Representations

16 Jan 2020  ·  Shah Nawaz, Alessandro Calefati, Moreno Caraffini, Nicola Landro, Ignazio Gallo ·

Fine-grained image classification is a challenging task due to the presence of hierarchical coarse-to-fine-grained distribution in the dataset. Generally, parts are used to discriminate various objects in fine-grained datasets, however, not all parts are beneficial and indispensable. In recent years, natural language descriptions are used to obtain information on discriminative parts of the object. This paper leverages on natural language description and proposes a strategy for learning the joint representation of natural language description and images using a two-branch network with multiple layers to improve the fine-grained classification task. Extensive experiments show that our approach gains significant improvements in accuracy for the fine-grained image classification task. Furthermore, our method achieves new state-of-the-art results on the CUB-200-2011 dataset.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Document Text Classification CUB-200-2011 Bert Accuracy 65.0 # 1
Multimodal Text and Image Classification CUB-200-2011 Two Branch Network (Text - Bert + Image - Nts-Net) Accuracy 96.81 # 1
Multi-Modal Document Classification CUB-200-2011 Two Branch Network (Text - Bert + Image - Nts-Net) 1:1 Accuracy 96.81 # 1
Multimodal Deep Learning CUB-200-2011 Two Branch Network (Text - Bert + Image - Nts-Net) Accuracy 96.81 # 1
Fine-Grained Image Classification CUB-200-2011 Nts-Net Accuracy 87.5 # 18

Methods


No methods listed for this paper. Add relevant methods here