Vision Transformers

Computer VisionImage Models • 44 methods

Vision Transformers are Transformer-like models applied to visual tasks. They stem from the work of ViT which directly applied a Transformer architecture on non-overlapping medium-sized image patches for image classification. Below you can find a continually updating list of vision transformers.

According to [1], ViT type models can be further categorized into uniform scale ViTs, multi-scale ViT, hybrid ViTs with convolutions, and self-supervised ViTs. The methods listed below provide a comprehensive overview of ViT models applied to a range of vision tasks.

[1] Transformers in Vision: A Survey

Method Year Papers
2020 982
2021 229
2020 124
2020 63
2021 59
2020 22
2021 19
2021 18
2021 16
2021 14
2021 8
2021 8
2021 8
2021 7
2021 6
2021 6
2021 5
2021 4
2021 4
2021 4
2021 4
2021 4
2021 3
2021 3
2020 3
2021 2
2021 2
2021 2
2022 2
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1