Computer Vision • Image Models • 47 methods
Vision Transformers are Transformer-like models applied to visual tasks. They stem from the work of ViT which directly applied a Transformer architecture on non-overlapping medium-sized image patches for image classification. Below you can find a continually updating list of vision transformers.
According to [1], ViT type models can be further categorized into uniform scale ViTs, multi-scale ViT, hybrid ViTs with convolutions, and self-supervised ViTs. The methods listed below provide a comprehensive overview of ViT models applied to a range of vision tasks.
Method | Year | Papers |
---|---|---|
2020 | 1771 | |
2021 | 358 | |
2020 | 201 | |
2021 | 146 | |
2020 | 87 | |
2020 | 33 | |
2021 | 31 | |
2021 | 29 | |
2021 | 26 | |
2021 | 22 | |
2021 | 20 | |
2021 | 12 | |
2021 | 11 | |
2021 | 10 | |
2021 | 9 | |
2021 | 9 | |
2021 | 8 | |
2021 | 8 | |
2022 | 5 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 3 | |
2021 | 3 | |
2021 | 3 | |
2021 | 3 | |
2020 | 3 | |
2021 | 2 | |
2021 | 2 | |
2021 | 2 | |
2021 | 2 | |
2022 | 2 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2022 | 1 |