Image-to-Image Translation with Conditional Adversarial Networks

We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image-to-Image Translation Aerial-to-Map cGAN Per-pixel Accuracy 70.0% # 1
Per-class Accuracy 46.0% # 1
Class IOU 0.26 # 1
Facial Expression Translation AR Face Enc.-Decoder AMT 0.1 # 3
PSNR 12.6660 # 3
Nuclear Segmentation Cell17 Pix2Pix F1-score 0.6208 # 4
Dice 0.6351 # 3
Hausdorff 19.1441 # 3
Image-to-Image Translation Cityscapes Labels-to-Photo pix2pix Class IOU 0.18 # 1
Per-class Accuracy 25.0 # 1
Per-pixel Accuracy 71.0 # 8
LPIPS 0 # 4
Image-to-Image Translation Cityscapes Photo-to-Labels pix2pix Per-pixel Accuracy 85.0% # 1
Per-class Accuracy 40.0% # 1
Class IOU 0.32 # 1
Cross-View Image-to-Image Translation cvusa Pix2pix SSIM 0.3923 # 7
Cross-View Image-to-Image Translation Dayton (256×256) - aerial-to-ground Pix2pix SSIM 0.418 # 5
Cross-View Image-to-Image Translation Dayton (256×256) - ground-to-aerial Pix2pix SSIM 0.2693 # 4
Cross-View Image-to-Image Translation Dayton (64×64) - aerial-to-ground Pix2pix SSIM 0.4808 # 5
Cross-View Image-to-Image Translation Dayton (64x64) - ground-to-aerial Pix2pix SSIM 0.3675 # 3
Cross-View Image-to-Image Translation Ego2Top Pix2pix SSIM 0.2213 # 4
Fundus to Angiography Generation Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients pix2pix FID 48.6 # 9
Colorization ImageNet val cGAN FID-5K 24.41 # 4

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Image Reconstruction Edge-to-Handbags pix2pix FID 96.31 # 4
LPIPS 0.234 # 1
Image Reconstruction Edge-to-Shoes pix2pix FID 197.492 # 4
LPIPS 0.238 # 1

Methods