Vision Transformers

Computer VisionImage Models • 47 methods

Vision Transformers are Transformer-like models applied to visual tasks. They stem from the work of ViT which directly applied a Transformer architecture on non-overlapping medium-sized image patches for image classification. Below you can find a continually updating list of vision transformers.

According to [1], ViT type models can be further categorized into uniform scale ViTs, multi-scale ViT, hybrid ViTs with convolutions, and self-supervised ViTs. The methods listed below provide a comprehensive overview of ViT models applied to a range of vision tasks.

[1] Transformers in Vision: A Survey

Method Year Papers
2020 1771
2021 358
2020 201
2021 146
2020 87
2020 33
2021 31
2021 29
2021 26
2021 22
2021 20
2021 12
2021 11
2021 10
2021 9
2021 9
2021 8
2021 8
2022 5
2021 4
2021 4
2021 4
2021 4
2021 4
2021 4
2021 3
2021 3
2021 3
2021 3
2020 3
2021 2
2021 2
2021 2
2021 2
2022 2
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2022 1