Dual Aggregation Transformer for Image Super-Resolution

Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at https://github.com/zhengchen1999/DAT.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Super-Resolution Manga109 - 4x upscaling DAT+ PSNR 32.67 # 7
SSIM 0.9301 # 6
Image Super-Resolution Manga109 - 4x upscaling DAT PSNR 32.51 # 8
SSIM 0.9291 # 7
Image Super-Resolution Set14 - 4x upscaling DAT+ PSNR 29.29 # 6
SSIM 0.7983 # 9
Image Super-Resolution Set14 - 4x upscaling DAT PSNR 29.23 # 9
SSIM 0.7973 # 12

Methods