Learning Trajectory-Aware Transformer for Video Super-Resolution

CVPR 2022  ·  Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian ·

Video super-resolution (VSR) aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts. Although some progress has been made, there are grand challenges to effectively utilize temporal dependency in entire video sequences. Existing approaches usually align and aggregate video frames from limited adjacent frames (e.g., 5 or 7 frames), which prevents these approaches from satisfactory results. In this paper, we take one step further to enable effective spatio-temporal learning in videos. We propose a novel Trajectory-aware Transformer for Video Super-Resolution (TTVSR). In particular, we formulate video frames into several pre-aligned trajectories which consist of continuous visual tokens. For a query token, self-attention is only learned on relevant visual tokens along spatio-temporal trajectories. Compared with vanilla vision Transformers, such a design significantly reduces the computational cost and enables Transformers to model long-range features. We further propose a cross-scale feature tokenization module to overcome scale-changing problems that often occur in long-range videos. Experimental results demonstrate the superiority of the proposed TTVSR over state-of-the-art models, by extensive quantitative and qualitative evaluations in four widely-used video super-resolution benchmarks. Both code and pre-trained models can be downloaded at https://github.com/researchmm/TTVSR.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Video Super-Resolution UDM10 - 4x upscaling TTVSR PSNR 40.41 # 4
SSIM 0.9712 # 4
Video Super-Resolution Vid4 - 4x upscaling - BD degradation TTVSR PSNR 28.40 # 6
SSIM 0.8643 # 6

Methods