MSRA-SR: Image Super-resolution Transformer with Multi-scale Shared Representation Acquisition

Multi-scale feature extraction is crucial for many computer vision tasks, but it is rarely explored in Transformer-based image super-resolution (SR) methods. In this paper, we propose an image super-resolution Transformer with Multi-scale Shared Representation Acquisition (MSRA-SR). We incorporate the multi-scale feature acquisition into two basic Transformer modules, i.e., self-attention and feed-forward network. In particular, self-attention with cross-scale matching and convolution filters with different kernel sizes are designed to exploit the multi-scale features in images. Both global and multi-scale local features are explicitly extracted in the network. Moreover, we introduce a representation sharing mechanism to improve the efficiency of the multi-scale design. Analysis on the attention map correlation indicates the representation redundancy in self-attention, which motivates us to design a shared self-attention across different Transformer layers. The exhaustive element-wise similarity matching is computed only once and then shared by later layers. Besides, the multi-scale convolution in different branches can be equivalently transformed into a single convolution with reparameterization trick. Extensive experiments on lightweight, classical and real-world image SR tasks verify the effectiveness and efficiency of the proposed method.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods