Personalized Image Generation
31 papers with code • 1 benchmarks • 1 datasets
Utilizes single or multiple images that contain the same subject or style, along with text prompt, to generate images that contain that subject as well as match the textual description. Includes finetuning-based methods (e.g. DreamBooth, Textual Inversion) as well as encoder-based methods (e.g. E4T, ELITE, and IP-Adapter, etc.).
Libraries
Use these libraries to find Personalized Image Generation models and implementationsMost implemented papers
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Despite the simplicity of our method, an IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fully fine-tuned image prompt model.
Personalized Image Generation for Color Vision Deficiency Population
Approximately, 350 million people, a proportion of 8%, suffer from color vision deficiency (CVD).
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation.
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain.
FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content
In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation
Text-to-image diffusion models have remarkably excelled in producing diverse, high-quality, and photo-realistic images.
Generative Multimodal Models are In-Context Learners
The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.