Image Model Blocks

XCiT Layer

Introduced by El-Nouby et al. in XCiT: Cross-Covariance Image Transformers

An XCiT Layer is the main building block of the XCiT architecture which uses a cross-covariance attention operator as its principal operation. The XCiT layer consists of three main blocks, each preceded by LayerNorm and followed by a residual connection: (i) the core cross-covariance attention (XCA) operation, (ii) the local patch interaction (LPI) module, and (iii) a feed-forward network (FFN). By transposing the query-key interaction, the computational complexity of XCA is linear in the number of data elements N, rather than quadratic as in conventional self-attention.

Source: XCiT: Cross-Covariance Image Transformers

Papers


Paper Code Results Date Stars

Tasks


Categories