Attention Modules

The Cross-Attention module is an attention module used in CrossViT for fusion of multi-scale features. The CLS token of the large branch (circle) serves as a query token to interact with the patch tokens from the small branch through attention. $f\left(·\right)$ and $g\left(·\right)$ are projections to align dimensions. The small branch follows the same procedure but swaps CLS and patch tokens from another branch.

Source: CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification


Paper Code Results Date Stars


Task Papers Share
Semantic Segmentation 10 5.99%
Image Classification 6 3.59%
Object Detection 6 3.59%
Autonomous Driving 5 2.99%
Retrieval 5 2.99%
Image Super-Resolution 4 2.40%
Super-Resolution 4 2.40%
Sentence 4 2.40%
Image Segmentation 3 1.80%