Vision Transformers

MUSIQ

Introduced by Ke et al. in MUSIQ: Multi-scale Image Quality Transformer

MUSIQ, or Multi-scale Image Quality Transformer, is a Transformer-based model for multi-scale image quality assessment. It processes native resolution images with varying sizes and aspect ratios. In MUSIQ, we construct a multi-scale image representation as input, including the native resolution image and its ARP resized variants. Each image is split into fixed-size patches which are embedded by a patch encoding module (blue boxes). To capture 2D structure of the image and handle images of varying aspect ratios, the spatial embedding is encoded by hashing the patch position $(i,j)$ to $(t_{i},t_{j})$ within a grid of learnable embeddings (red boxes). Scale Embedding (green boxes) is introduced to capture scale information. The Transformer encoder takes the input tokens and performs multi-head self-attention. To predict the image quality, MUSIQ follows a common strategy in Transformers to add an [CLS] token to the sequence to represent the whole multi-scale input and the corresponding Transformer output is used as the final representation.

Source: MUSIQ: Multi-scale Image Quality Transformer

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Image Dehazing 1 14.29%
Image Enhancement 1 14.29%
Image Generation 1 14.29%
Image Super-Resolution 1 14.29%
Super-Resolution 1 14.29%
Image Quality Assessment 1 14.29%
Video Quality Assessment 1 14.29%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories