Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation

Effectively extracting inter-frame motion and appearance information is important for video frame interpolation (VFI). Previous works either extract both types of information in a mixed way or elaborate separate modules for each type of information, which lead to representation ambiguity and low efficiency. In this paper, we propose a novel module to explicitly extract motion and appearance information via a unifying operation. Specifically, we rethink the information process in inter-frame attention and reuse its attention map for both appearance feature enhancement and motion information extraction. Furthermore, for efficient VFI, our proposed module could be seamlessly integrated into a hybrid CNN and Transformer architecture. This hybrid pipeline can alleviate the computational complexity of inter-frame attention as well as preserve detailed low-level structure information. Experimental results demonstrate that, for both fixed- and arbitrary-timestep interpolation, our method achieves state-of-the-art performance on various datasets. Meanwhile, our approach enjoys a lighter computation overhead over models with close performance. The source code and models are available at https://github.com/MCG-NJU/EMA-VFI.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Frame Interpolation MSU Video Frame Interpolation EMA-VFI PSNR 29.89 # 1
SSIM 0.953 # 1
VMAF 71.71 # 2
LPIPS 0.022 # 2
MS-SSIM 0.965 # 1
Video Frame Interpolation SNU-FILM (easy) EMA-VFI PSNR 39.98 # 6
SSIM 0.9910 # 2
Video Frame Interpolation SNU-FILM (extreme) EMA-VFI PSNR 25.69 # 3
SSIM 0.8661 # 2
Video Frame Interpolation SNU-FILM (hard) EMA-VFI PSNR 30.94 # 3
SSIM 0.9392 # 2
Video Frame Interpolation SNU-FILM (medium) EMA-VFI PSNR 36.09 # 5
SSIM 0.9801 # 2
Video Frame Interpolation UCF101 EMA-VFI PSNR 35.48 # 1
SSIM 0.9701 # 3
Video Frame Interpolation Vimeo90K EMA-VFI PSNR 36.64 # 2
SSIM 0.9819 # 1
Video Frame Interpolation X4K1000FPS EMA-VFI PSNR 31.46 # 3
Video Frame Interpolation X4K1000FPS-2K EMA-VFI PSNR 32.85 # 1
Video Frame Interpolation Xiph-2K EMA-VFI PSNR 36.90 # 1
SSIM 0.945 # 3
Video Frame Interpolation Xiph-4k EMA-VFI PSNR 34.67 # 1
SSIM 0.907 # 1

Methods