MUSIQ: Multi-scale Image Quality Transformer

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Quality Assessment MSU NR VQA Database MUSIQ SRCC 0.9004 # 3
PLCC 0.9068 # 3
KLCC 0.7433 # 3
Video Quality Assessment MSU NR VQA Database MUSIQ SRCC 0.9004 # 7
PLCC 0.9068 # 8
KLCC 0.7433 # 7
Type NR # 1
Video Quality Assessment MSU SR-QA Dataset MUSIQ trained on PaQ-2-PiQ SROCC 0.67746 # 6
PLCC 0.66531 # 6
KLCC 0.55312 # 6
Type NR # 1
Video Quality Assessment MSU SR-QA Dataset MUSIQ trained on AVA SROCC 0.56152 # 26
PLCC 0.52404 # 33
KLCC 0.44669 # 26
Type NR # 1
Video Quality Assessment MSU SR-QA Dataset MUSIQ trained on KONIQ SROCC 0.64589 # 12
PLCC 0.59151 # 17
KLCC 0.51897 # 14
Type NR # 1
Video Quality Assessment MSU SR-QA Dataset MUSIQ trained on SPAQ SROCC 0.64927 # 10
PLCC 0.60216 # 15
KLCC 0.52673 # 10
Type NR # 1

Methods