Pix2Pix is a conditional image-to-image translation architecture that uses a conditional GAN objective combined with a reconstruction loss. The conditional GAN objective for observed images $x$, output images $y$ and the random noise vector $z$ is:
$$ \mathcal{L}_{cGAN}\left(G, D\right) =\mathbb{E}_{x,y}\left[\log D\left(x, y\right)\right]+ \mathbb{E}_{x,z}\left[log(1 − D\left(x, G\left(x, z\right)\right)\right] $$
We augment this with a reconstruction term:
$$ \mathcal{L}_{L1}\left(G\right) = \mathbb{E}_{x,y,z}\left[||y - G\left(x, z\right)||_{1}\right] $$
and we get the final objective as:
$$ G^{*} = \arg\min_{G}\max_{D}\mathcal{L}_{cGAN}\left(G, D\right) + \lambda\mathcal{L}_{L1}\left(G\right) $$
The architectures employed for the generator and discriminator closely follow DCGAN, with a few modifications:
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image-to-Image Translation | 29 | 20.28% |
Image Generation | 12 | 8.39% |
Semantic Segmentation | 8 | 5.59% |
Colorization | 6 | 4.20% |
Style Transfer | 5 | 3.50% |
Anatomy | 4 | 2.80% |
Autonomous Driving | 3 | 2.10% |
Management | 2 | 1.40% |
Cross-View Image-to-Image Translation | 2 | 1.40% |