By providing stronger supervision to the discriminator as well as to the generator through spatially- and semantically-aware discriminator feedback, we are able to synthesize images of higher fidelity with better alignment to their input label maps, making the use of the perceptual loss superfluous.
Ranked #1 on Image-to-Image Translation on Cityscapes Labels-to-Photo (mIoU metric)
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics, enabling the generator to synthesize images with varying structure, appearance and levels of detail, maintaining global and local realism.
Ranked #1 on Conditional Image Generation on COCO-Animals
Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space.