TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-based Image Editing	PIE-Bench	DDIM Inversion+MasaCtrl	CLIPSIM	23.96	# 12
Text-based Image Editing	PIE-Bench	DDIM Inversion+MasaCtrl	Structure Distance	28.38	# 11
Text-based Image Editing	PIE-Bench	DDIM Inversion+MasaCtrl	Background PSNR	22.17	# 11
Text-based Image Editing	PIE-Bench	DDIM Inversion+MasaCtrl	Background LPIPS	106.62	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masactrl-tuning-free-mutual-self-attention/text-based-image-editing-on-pie-bench)](https://paperswithcode.com/sota/text-based-image-editing-on-pie-bench?p=masactrl-tuning-free-mutual-self-attention)`

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

ICCV 2023 · Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, XiaoHu Qie, Yinqiang Zheng ·

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results. For example, generation approaches usually fail to synthesize multiple images of the same objects/characters but with different views or poses. Meanwhile, existing editing methods either fail to achieve effective complex non-rigid editing while maintaining the overall textures and identity, or require time-consuming fine-tuning to capture the image-specific appearance. In this paper, we develop MasaCtrl, a tuning-free method to achieve consistent image generation and complex non-rigid image editing simultaneously. Specifically, MasaCtrl converts existing self-attention in diffusion models into mutual self-attention, so that it can query correlated local contents and textures from source images for consistency. To further alleviate the query confusion between foreground and background, we propose a mask-guided mutual self-attention strategy, where the mask can be easily extracted from the cross-attention maps. Extensive experiments show that the proposed MasaCtrl can produce impressive results in both consistent image generation and complex non-rigid real image editing.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

tencentarc/masactrl official

↳ Quickstart in

Colab

Spaces

632

phymhan/prompt-to-prompt

hansam95/nmg

Tasks

Add Remove

Image Generation

Text-based Image Editing

Text-to-Image Generation

Datasets

P2 PIE-Bench

Results from the Paper

Add Remove

Ranked #11 on Text-based Image Editing on PIE-Bench

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-based Image Editing	PIE-Bench	DDIM Inversion+MasaCtrl	CLIPSIM	23.96	# 12	Compare
			Structure Distance	28.38	# 11	Compare
			Background PSNR	22.17	# 11	Compare
			Background LPIPS	106.62	# 10	Compare

Methods

Add Remove

Diffusion • fail

Edit Social Preview

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove