MAXIM: Multi-Axis MLP for Image Processing

Recent progress on Transformers and multi-layer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for low-level vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and `fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models. The source code and trained models will be available at \url{https://github.com/google-research/maxim}.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Results from the Paper


 Ranked #1 on Deblurring on RealBlur-J (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Denoising DND MAXIM-3S PSNR (sRGB) 39.84 # 5
SSIM (sRGB) 0.954 # 7
Image Deblurring GoPro MAXIM-3S PSNR 32.86 # 10
Deblurring GoPro MAXIM-3S PSNR 32.86 # 17
Image Deblurring HIDE MAXIM-3S SSIM 0.956 # 1
Deblurring HIDE MAXIM-3S PSNR 32.83 # 1
Deblurring HIDE (trained on GOPRO) MAXIM PSNR (sRGB) 32.83 # 1
SSIM (sRGB) 0.956 # 1
Params (M) 22.2 # 6
Low-Light Image Enhancement LOL MAXIM Average PSNR 23.43 # 3
SSIM 0.863 # 2
Photo Retouching MIT-Adobe 5k MAXIM PSNR 26.15 # 1
SSIM 0.945 # 1
Deblurring MSU BASED MAXIM (GoPro) SSIM 0.94386 # 8
PSNR 31.36344 # 4
VMAF 67.7557 # 1
LPIPS 0.09188 # 10
ERQAv2.0 0.74444 # 5
Subjective 0.2070 # 9
Deblurring MSU BASED MAXIM (REDS) SSIM 0.94959 # 2
PSNR 30.65728 # 8
VMAF 67.3502 # 2
LPIPS 0.07836 # 1
ERQAv2.0 0.74277 # 9
Subjective 1.0081 # 5
Single Image Deraining Rain100H MAXIM SSIM 0.903 # 3
Single Image Deraining Rain100L MAXIM SSIM 0.977 # 5
Deblurring RealBlur-J MAXIM SSIM (sRGB) 0.935 # 1
PSNR (sRGB) 32.84 # 1
Params(M) 22.2 # 7
Deblurring RealBlur-J (trained on GoPro) MAXIM PSNR (sRGB) 28.83 # 4
SSIM (sRGB) 0.875 # 5
Deblurring RealBlur-R MAXIM-3S SSIM (sRGB) 0.961 # 7
Deblurring RealBlur-R MAXIM PSNR (sRGB) 39.45 # 5
Deblurring RealBlur-R (trained on GoPro) MAXIM PSNR (sRGB) 35.78 # 6
Image Denoising SIDD MAXIM-3S PSNR (sRGB) 39.96 # 5
SSIM (sRGB) 0.960 # 3
Image Dehazing SOTS Indoor MAXIM-2S PSNR 38.11 # 5
Image Dehazing SOTS Outdoor MAXIM-2S PSNR 34.19 # 4
Single Image Deraining Test100 MAXIM PSNR 31.17 # 2
SSIM 0.922 # 2
Single Image Deraining Test1200 MAXIM SSIM 0.922 # 2
Single Image Deraining Test2800 MAXIM PSNR 33.80 # 3

Methods


No methods listed for this paper. Add relevant methods here