DALL·E 2 is a generative text-to-image model made up of two main components: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding.
Source: Hierarchical Text-Conditional Image Generation with CLIP LatentsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
multimodal interaction | 1 | 14.29% |
Conditional Image Generation | 1 | 14.29% |
Decoder | 1 | 14.29% |
Diversity | 1 | 14.29% |
Image Generation | 1 | 14.29% |
Text-to-Image Generation | 1 | 14.29% |
Zero-Shot Text-to-Image Generation | 1 | 14.29% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |