This paper addresses the challenge of learning a local visual pattern of an object from one image, and generating images depicting objects with that pattern.
Recent advancements in text-to-image generative models have demonstrated a remarkable ability to capture a deep semantic understanding of images.
In this paper, we present a technique to generate a collection of images that depicts variations in the shape of a specific object, enabling an object-level shape exploration process.
This unique combination of text and shape guidance allows for increased control over the generation process.
Ranked #3 on Text to 3D on T$^3$Bench
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
Of these, StyleGAN offers a fascinating case study, owing to its remarkable visual quality and an ability to support a large array of downstream tasks.
In this paper, we build on the StyleGAN generator, and present a method that explicitly encourages face manipulation to focus on the intended regions by incorporating learned attention maps.
In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery.
Can a generative model be trained to produce images from a specific domain, guided by a text prompt only, without seeing any image?
Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes.
Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate of the inverted latent code in a self-correcting manner.
Inspired by the ability of StyleGAN to generate highly realistic images in a variety of domains, much recent work has focused on understanding how to use the latent spaces of StyleGAN to manipulate generated and real images.
In this formulation, our method approaches the continuous aging process as a regression task between the input age and desired target age, providing fine-grained control over the generated image.
We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on.
State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity.
We present a generic image-to-image translation framework, pixel2style2pixel (pSp).