NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Outpainting LHQC NUWA-Infinity w/o text Block-FID (Right Extend) 6.43 # 1
Block-FID (Left Extend) 6.71 # 1
Block-FID (Down Extend) 11.47 # 2
Block-FID (Up Extend) 8.03 # 2
Image Outpainting LHQC NUWA-Infinity Block-FID (Right Extend) 6.45 # 2
Block-FID (Left Extend) 6.72 # 2
Block-FID (Down Extend) 9.84 # 1
Block-FID (Up Extend) 7.43 # 1
Text-to-Image Generation LHQC NUWA-Infinity Block-FID 9.71 # 1

Methods


No methods listed for this paper. Add relevant methods here