An XCiT Layer is the main building block of the XCiT architecture which uses a cross-covariance attention operator as its principal operation. The XCiT layer consists of three main blocks, each preceded by LayerNorm and followed by a residual connection: (i) the core cross-covariance attention (XCA) operation, (ii) the local patch interaction (LPI) module, and (iii) a feed-forward network (FFN). By transposing the query-key interaction, the computational complexity of XCA is linear in the number of data elements N, rather than quadratic as in conventional self-attention.
Source: XCiT: Cross-Covariance Image TransformersPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 3 | 33.33% |
Quantization | 1 | 11.11% |
Pose Estimation | 1 | 11.11% |
Instance Segmentation | 1 | 11.11% |
Object Detection | 1 | 11.11% |
Self-Supervised Image Classification | 1 | 11.11% |
Semantic Segmentation | 1 | 11.11% |