HyperTransformer: Attention-Based CNN Model Generation from Few Samples

29 Sep 2021  ·  Andrey Zhmoginov, Max Vladymyrov, Mark Sandler ·

In this work we propose a HyperTransformer, a transformer based model that generates all weights of a CNN model directly from the support samples. This approach allows to use a high-capacity model for encoding task-dependent variations in the weights of a smaller model. We show for multiple few-shot benchmarks with different architectures and datasets that our method beats or matches that of the traditional learning methods in a few-shot regime. Specifically, we show that for very small target models, our method can generate significantly better performing models than traditional few-shot learning methods. For larger models we discover that applying generation to the last layer only, allows to produce competitive or better results while being end-to-end differentiable. Finally, we extend our approach to semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance in the presence of unlabeled data.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here