Efficient Concertormer for Image Deblurring and Beyond

9 Apr 2024  ·  Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang ·

The Transformer architecture has achieved remarkable success in natural language processing and high-level vision tasks over the past few years. However, the inherent complexity of self-attention is quadratic to the size of the image, leading to unaffordable computational costs for high-resolution vision tasks. In this paper, we introduce Concertormer, featuring a novel Concerto Self-Attention (CSA) mechanism designed for image deblurring. The proposed CSA divides self-attention into two distinct components: one emphasizes generally global and another concentrates on specifically local correspondence. By retaining partial information in additional dimensions independent from the self-attention calculations, our method effectively captures global contextual representations with complexity linear to the image size. To effectively leverage the additional dimensions, we present a Cross-Dimensional Communication module, which linearly combines attention maps and thus enhances expressiveness. Moreover, we amalgamate the two-staged Transformer design into a single stage using the proposed gated-dconv MLP architecture. While our primary objective is single-image motion deblurring, extensive quantitative and qualitative evaluations demonstrate that our approach performs favorably against the state-of-the-art methods in other tasks, such as deraining and deblurring with JPEG artifacts. The source codes and trained models will be made available to the public.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods