Attention Mechanisms

Cross-Covariance Attention

Introduced by El-Nouby et al. in XCiT: Cross-Covariance Image Transformers

Cross-Covariance Attention, or XCA, is an attention mechanism which operates along the feature dimension instead of the token dimension as in conventional transformers.

Using the definitions of queries, keys and values from conventional attention, the cross-covariance attention function is defined as:

$$ \text { XC-Attention }(Q, K, V)=V \mathcal{A}_{\mathrm{XC}}(K, Q), \quad \mathcal{A}_{\mathrm{XC}}(K, Q)=\operatorname{Softmax}\left(\hat{K}^{\top} \hat{Q} / \tau\right) $$

where each output token embedding is a convex combination of the $d_{v}$ features of its corresponding token embedding in $V$. The attention weights $\mathcal{A}$ are computed based on the cross-covariance matrix.

Source: XCiT: Cross-Covariance Image Transformers

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories