Adversarial Learning with Mask Reconstruction for Text-Guided Image Inpainting

Text-guided image inpainting aims to complete the corrupted patches coherent with both visual and textual context. On one hand, existing works focus on surrounding pixels of the corrupted patches without considering the objects in the image, resulting in the characteristics of objects described in text being painted on non-object regions. On the other hand, the redundant information in text may distract the generation of objects of interest in the restored image. In this paper, we propose an adversarial learning framework with mask reconstruction (ALMR) for image inpainting with textual guidance, which consists of a two-stage generator and dual discriminators. The two-stage generator aims to restore coarse-grained and fine-grained images, respectively. In particular, we devise a dual-attention module (DAM) to incorporate the word-level and sentence-level textual features as guidance on generating the coarse-grained and fine-grained details in the two stages. Furthermore, we design a mask reconstruction module (MRM) to penalize the restoration of the objects of interest with the given textual descriptions about the objects. For adversarial training, we exploit global and local discriminators for the whole image and corrupted patches, respectively. Extensive experiments conducted on CUB-200-2011, Oxford-102 and CelebA-HQ show the outperformance of the proposed ALMR (e.g., FID value is reduced from 29.69 to 14.69 compared with the state-of-the-art approach on CUB-200-2011). Codes are available at \href{https://github.com/GaranWu/ALMR}

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here