Cross-Covariance Attention

Introduced by El-Nouby et al. in XCiT: Cross-Covariance Image Transformers

Cross-Covariance Attention, or XCA, is an attention mechanism which operates along the feature dimension instead of the token dimension as in conventional transformers.

Using the definitions of queries, keys and values from conventional attention, the cross-covariance attention function is defined as:

$$ \text { XC-Attention }(Q, K, V)=V \mathcal{A}_{\mathrm{XC}}(K, Q), \quad \mathcal{A}_{\mathrm{XC}}(K, Q)=\operatorname{Softmax}\left(\hat{K}^{\top} \hat{Q} / \tau\right) $$

where each output token embedding is a convex combination of the $d_{v}$ features of its corresponding token embedding in $V$. The attention weights $\mathcal{A}$ are computed based on the cross-covariance matrix.

Source: XCiT: Cross-Covariance Image Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	3	27.27%
Denoising	1	9.09%
Motion Magnification	1	9.09%
Quantization	1	9.09%
Pose Estimation	1	9.09%
Instance Segmentation	1	9.09%
Object Detection	1	9.09%
Self-Supervised Image Classification	1	9.09%
Semantic Segmentation	1	9.09%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Mechanisms