no code implementations • 18 Dec 2023 • Ahmed Bourouis, Judith Ellen Fan, Yulia Gryaditskaya
To obtain generalization to a large set of sketches and categories, we build on a vision transformer encoder pretrained with the CLIP model.
Disentanglement Visual Prompt Tuning