MUSIQ, or Multi-scale Image Quality Transformer, is a Transformer-based model for multi-scale image quality assessment. It processes native resolution images with varying sizes and aspect ratios. In MUSIQ, we construct a multi-scale image representation as input, including the native resolution image and its ARP resized variants. Each image is split into fixed-size patches which are embedded by a patch encoding module (blue boxes). To capture 2D structure of the image and handle images of varying aspect ratios, the spatial embedding is encoded by hashing the patch position $(i,j)$ to $(t_{i},t_{j})$ within a grid of learnable embeddings (red boxes). Scale Embedding (green boxes) is introduced to capture scale information. The Transformer encoder takes the input tokens and performs multi-head self-attention. To predict the image quality, MUSIQ follows a common strategy in Transformers to add an [CLS] token to the sequence to represent the whole multi-scale input and the corresponding Transformer output is used as the final representation.
Source: MUSIQ: Multi-scale Image Quality TransformerPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Dehazing | 1 | 14.29% |
Image Enhancement | 1 | 14.29% |
Image Generation | 1 | 14.29% |
Image Super-Resolution | 1 | 14.29% |
Super-Resolution | 1 | 14.29% |
Image Quality Assessment | 1 | 14.29% |
Video Quality Assessment | 1 | 14.29% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |