TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Inpainting	DAVIS	FuseFormer	PSNR	32.54	# 3
Video Inpainting	DAVIS	FuseFormer	SSIM	0.9700	# 3
Video Inpainting	DAVIS	FuseFormer	VFID	0.138	# 3
Video Inpainting	DAVIS	FuseFormer	Ewarp	0.1362	# 2
Seeing Beyond the Visible	KITTI360-EX	FuseFormer	Average PSNR	18.91	# 4
Video Inpainting	YouTube-VOS 2018	FuseFormer	PSNR	33.29	# 4
Video Inpainting	YouTube-VOS 2018	FuseFormer	SSIM	0.9681	# 4
Video Inpainting	YouTube-VOS 2018	FuseFormer	VFID	0.053	# 4
Video Inpainting	YouTube-VOS 2018	FuseFormer	Ewarp	0.0900	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fuseformer-fusing-fine-grained-information-in/video-inpainting-on-davis)](https://paperswithcode.com/sota/video-inpainting-on-davis?p=fuseformer-fusing-fine-grained-information-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fuseformer-fusing-fine-grained-information-in/seeing-beyond-the-visible-on-kitti360-ex)](https://paperswithcode.com/sota/seeing-beyond-the-visible-on-kitti360-ex?p=fuseformer-fusing-fine-grained-information-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fuseformer-fusing-fine-grained-information-in/video-inpainting-on-youtube-vos)](https://paperswithcode.com/sota/video-inpainting-on-youtube-vos?p=fuseformer-fusing-fine-grained-information-in)`

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

ICCV 2021 · Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li ·

Transformer, as a strong and flexible architecture for modelling long-range relations, has been widely explored in vision tasks. However, when used in video inpainting that requires fine-grained representation, existed method still suffers from yielding blurry edges in detail due to the hard patch splitting. Here we aim to tackle this problem by proposing FuseFormer, a Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations. The soft split divides feature map into many patches with given overlapping interval. On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up. These two modules are first used in tokenization before Transformer layers and de-tokenization after Transformer layers, for effective mapping between tokens and features. Therefore, sub-patch level information interaction is enabled for more effective feature propagation between neighboring patches, resulting in synthesizing vivid content for hole regions in videos. Moreover, in FuseFormer, we elaborately insert the soft composition and soft split into the feed-forward network, enabling the 1D linear layers to have the capability of modelling 2D structure. And, the sub-patch level feature fusion ability is further enhanced. In both quantitative and qualitative evaluations, our proposed FuseFormer surpasses state-of-the-art methods. We also conduct detailed analysis to examine its superiority.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

ruiliu-ai/fuseformer official

102

Tasks

Add Remove

Seeing Beyond the Visible

Video Inpainting

Datasets

DAVIS

YouTube-VOS 2018

KITTI360-EX

Results from the Paper

Edit

Ranked #3 on Video Inpainting on DAVIS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Inpainting	DAVIS	FuseFormer	PSNR	32.54	# 3	Compare
			SSIM	0.9700	# 3	Compare
			VFID	0.138	# 3	Compare
			Ewarp	0.1362	# 2	Compare
Seeing Beyond the Visible	KITTI360-EX	FuseFormer	Average PSNR	18.91	# 4	Compare
Video Inpainting	YouTube-VOS 2018	FuseFormer	PSNR	33.29	# 4	Compare
			SSIM	0.9681	# 4	Compare
			VFID	0.053	# 4	Compare
			Ewarp	0.0900	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • FuseFormer • FuseFormer Block • Inpainting • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Soft Split and Soft Composition • Transformer

Edit Social Preview

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove