End-to-end Learning of a Fisher Vector Encoding for Part Features in Fine-grained Recognition

4 Jul 2020  ·  Dimitri Korsch, Paul Bodesheim, Joachim Denzler ·

Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although explicitly focusing on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in not observable parts. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated by an online EM algorithm jointly with those of the neural network and are more precise than the estimates of previous works. Our approach improves state-of-the-art accuracies for three bird species classification datasets.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Fine-Grained Image Classification CUB-200-2011 DeepFVE Accuracy 90.95% # 20
Fine-Grained Image Classification NABirds FVE Accuracy 90.3% # 10

Methods


No methods listed for this paper. Add relevant methods here