Pre-Trained Image Processing Transformer

As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at and

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper

 Ranked #1 on Single Image Deraining on Rain100L (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Super-Resolution BSD100 - 2x upscaling IPT PSNR 32.48 # 6
Color Image Denoising CBSD68 sigma50 IPT PSNR 29.39 # 1
Color Image Denoising McMaster sigma50 IPT PSNR 29.98 # 2
Single Image Deraining Rain100L IPT PSNR 41.62 # 1
SSIM 0.988 # 1
Image Super-Resolution Set14 - 3x upscaling IPT PSNR 30.85 # 5
Image Super-Resolution Urban100 - 3x upscaling IPT PSNR 29.49 # 5
Color Image Denoising Urban100 sigma50 IPT PSNR 29.71 # 5