Semantic Image Synthesis with Spatially-Adaptive Normalization

We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal as the normalization layers tend to ``wash away'' semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style. Code is available at https://github.com/NVlabs/SPADE .

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image-to-Image Translation ADE20K Labels-to-Photos SPADE mIoU 38.5 # 7
Accuracy 79.9% # 4
FID 33.9 # 9
LPIPS 0 # 4
Image-to-Image Translation ADE20K-Outdoor Labels-to-Photos SPADE mIoU 30.8 # 3
Accuracy 82.9% # 1
FID 63.3 # 4
Image-to-Image Translation Cityscapes Labels-to-Photo SPADE Per-pixel Accuracy 81.9% # 4
mIoU 62.3 # 8
FID 71.8 # 12
Sketch-to-Image Translation COCO-Stuff SPADE FID 89.2 # 3
FID-C 48.9 # 3
Image-to-Image Translation COCO-Stuff Labels-to-Photos SPADE mIoU 37.4 # 4
Accuracy 67.9% # 3
FID 22.6 # 9

Methods