Knowledge distillation is an effective approach to learn compact models (students) with the supervision of large and strong models (teachers).
Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.
A feasible solution is to start with a GAN well-trained on a large scale source domain and adapt it to the target domain with a few samples, termed as few shot generative model adaption.
We save a moderate number of intermediate models from the training process of the teacher model uniformly, and then integrate the knowledge of these intermediate models by ensemble technique.
As a data augmentation method, FOT can be conveniently applied to any existing few shot learning algorithm and greatly improve its performance on FG-FSL tasks.
The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification.