Attention Modules

The Cross-Attention module is an attention module used in CrossViT for fusion of multi-scale features. The CLS token of the large branch (circle) serves as a query token to interact with the patch tokens from the small branch through attention. $f\left(·\right)$ and $g\left(·\right)$ are projections to align dimensions. The small branch follows the same procedure but swaps CLS and patch tokens from another branch.

Source: CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Decoder 20 5.73%
Semantic Segmentation 19 5.44%
Object 11 3.15%
Image Segmentation 9 2.58%
Image Generation 9 2.58%
Retrieval 9 2.58%
Image Classification 8 2.29%
Object Detection 8 2.29%
Medical Image Segmentation 7 2.01%

Categories