Layout-to-Image Generation
18 papers with code • 7 benchmarks • 4 datasets
Layout-to-image generation its the task to generate a scene based on the given layout. The layout describes the location of the objects to be included in the output image. In this section, you can find state-of-the-art leaderboards for Layout-to-image generation.
Latest papers
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
To further improve T2I models' capability in numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and layout-based diffusion models.
Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout.
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects.
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form.
Freestyle Layout-to-Image Synthesis
In this work, we explore the freestyle capability of the model, i. e., how far can it generate unseen semantics (e. g., classes, attributes, and styles) onto a given layout, and call the task Freestyle LIS (FLIS).
Modeling Image Composition for Complex Scene Generation
Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training.
Interactive Image Synthesis with Panoptic Layout Generation
In particular, the stuff layouts can take amorphous shapes and fill up the missing regions left out by the instance layouts.
High-Resolution Image Synthesis with Latent Diffusion Models
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable Layout and Style
In this paper, we propose a method for attribute controlled image synthesis from layout which allows to specify the appearance of individual objects without affecting the rest of the image.