SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Recent advances in image generation gave rise to powerful tools for semantic image editing. However, existing approaches can either operate on a single image or require an abundance of additional information. They are not capable of handling the complete set of editing operations, that is addition, manipulation or removal of semantic concepts. To address these limitations, we propose SESAME, a novel generator-discriminator pair for Semantic Editing of Scenes by Adding, Manipulating or Erasing objects. In our setup, the user provides the semantic labels of the areas to be edited and the generator synthesizes the corresponding pixels. In contrast to previous methods that employ a discriminator that trivially concatenates semantics and image as an input, the SESAME discriminator is composed of two input streams that independently process the image and its semantics, using the latter to manipulate the results of the former. We evaluate our model on a diverse set of datasets and report state-of-the-art performance on two tasks: (a) image manipulation and (b) image generation conditioned on semantic labels.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image-to-Image Translation ADE20K Labels-to-Photos SPADE + SESAME mIoU 49 # 3
Accuracy 85.5% # 1
FID 31.9 # 6
Image-to-Image Translation Cityscapes Labels-to-Photo SPADE + SESAME Per-pixel Accuracy 82.5% # 1
mIoU 66 # 4
FID 54.2 # 8

Methods