Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

Generalized few-shot semantic segmentation (GFSS) distinguishes pixels of base and novel classes from the background simultaneously, conditioning on sufficient data of base classes and a few examples from novel class. A typical GFSS approach has two training phases: base class learning and novel class updating. Nevertheless, such a stand-alone updating process often compromises the well-learnt features and results in performance drop on base classes. In this paper, we propose a new idea of leveraging Projection onto Orthogonal Prototypes (POP), which updates features to identify novel classes without compromising base classes. POP builds a set of orthogonal prototypes, each of which represents a semantic class, and makes the prediction for each class separately based on the features projected onto its prototype. Technically, POP first learns prototypes on base data, and then extends the prototype set to novel classes. The orthogonal constraint of POP encourages the orthogonality between the learnt prototypes and thus mitigates the influence on base class features when generalizing to novel prototypes. Moreover, we capitalize on the residual of feature projection as the background representation to dynamically fit semantic shifting (i.e., background no longer includes the pixels of novel classes in updating phase). Extensive experiments on two benchmarks demonstrate that our POP achieves superior performances on novel classes without sacrificing much accuracy on base classes. Notably, POP outperforms the state-of-the-art fine-tuning by 3.93% overall mIoU on PASCAL-5i in 5-shot scenario.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Generalized Few-Shot Semantic Segmentation COCO-20i (1-shot) POP(ResNet-50) Mean IoU 44.98 # 1
Mean Base and Novel 35.01 # 1
Generalized Few-Shot Semantic Segmentation COCO-20i (5-shot) POP(ResNet-50) Mean IoU 48.75 # 1
Mean Base and Novel 42.44 # 1
Generalized Few-Shot Semantic Segmentation PASCAL-5i (1-Shot) POP(ResNet-50) Mean IoU 64.77 # 1
Mean Base and Novel 54.72 # 1
Generalized Few-Shot Semantic Segmentation PASCAL-5i (5-Shot) POP(ResNet-50) Mean IoU 70.28 # 1
Mean Base and Novel 65.33 # 1

Methods