We propose an architecture for solving PerVL that operates by extending the input vocabulary of a pretrained model with new word embeddings for the new personalized concepts.
Of these, StyleGAN offers a fascinating case study, owing to its remarkable visual quality and an ability to support a large array of downstream tasks.
We compare our models to a wide range of latent editing methods, and show that by alleviating the bias they achieve finer semantic control and better identity preservation through a wider range of transformations.
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.
Can a generative model be trained to produce images from a specific domain, guided by a text prompt only, without seeing any image?
For modern generative frameworks, this semantic encoding manifests as smooth, linear directions which affect image attributes in a disentangled manner.
Our network encourages disentangled generation of semantic parts via two key ingredients: a root-mixing training strategy which helps decorrelate the different branches to facilitate disentanglement, and a set of loss terms designed with part disentanglement and shape semantics in mind.