MUSIQ

Introduced by Ke et al. in MUSIQ: Multi-scale Image Quality Transformer

MUSIQ, or Multi-scale Image Quality Transformer, is a Transformer-based model for multi-scale image quality assessment. It processes native resolution images with varying sizes and aspect ratios. In MUSIQ, we construct a multi-scale image representation as input, including the native resolution image and its ARP resized variants. Each image is split into fixed-size patches which are embedded by a patch encoding module (blue boxes). To capture 2D structure of the image and handle images of varying aspect ratios, the spatial embedding is encoded by hashing the patch position $(i,j)$ to $(t_{i},t_{j})$ within a grid of learnable embeddings (red boxes). Scale Embedding (green boxes) is introduced to capture scale information. The Transformer encoder takes the input tokens and performs multi-head self-attention. To predict the image quality, MUSIQ follows a common strategy in Transformers to add an [CLS] token to the sequence to represent the whole multi-scale input and the corresponding Transformer output is used as the final representation.

Source: MUSIQ: Multi-scale Image Quality Transformer

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Dehazing	1	14.29%
Image Enhancement	1	14.29%
Image Generation	1	14.29%
Image Super-Resolution	1	14.29%
Super-Resolution	1	14.29%
Image Quality Assessment	1	14.29%
Video Quality Assessment	1	14.29%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Vision Transformers

Image Quality Models