MAT: Mask-Aware Transformer for Large Hole Image Inpainting

CVPR 2022  ·  Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia ·

Recent studies have shown the importance of modeling long-range interactions in the inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution in consideration of computational cost. In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. We carefully design each component of our framework to guarantee the high fidelity and diversity of recovered images. Specifically, we customize an inpainting-oriented transformer block, where the attention module aggregates non-local information only from partial valid tokens, indicated by a dynamic mask. Extensive experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets. Code is released at

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Inpainting CelebA-HQ MAT FID 4.86 # 1
P-IDS 13.83 # 1
U-IDS 25.33 # 1
Image Inpainting Places2 MAT FID 1.96 # 2
P-IDS 23.42 # 1
U-IDS 38.34 # 1