Vision Transformers

Computer VisionImage Models • 43 methods

Vision Transformers are Transformer-like models applied to visual tasks. They stem from the work of ViT which directly applied a Transformer architecture on non-overlapping medium-sized image patches for image classification. Below you can find a continually updating list of vision transformers.

According to [1], ViT type models can be further categorized into uniform scale ViTs, multi-scale ViT, hybrid ViTs with convolutions, and self-supervised ViTs. The methods listed below provide a comprehensive overview of ViT models applied to a range of vision tasks.

[1] Transformers in Vision: A Survey

Method Year Papers
2020 327
2021 85
2020 58
2020 37
2021 13
2020 11
2021 8
2021 8
2021 7
2021 7
2021 6
2021 6
2021 5
2021 5
2021 4
2021 3
2021 2
2021 2
2021 2
2021 2
2020 2
2021 2
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1