Computer Vision • Image Models • 44 methods
Vision Transformers are Transformer-like models applied to visual tasks. They stem from the work of ViT which directly applied a Transformer architecture on non-overlapping medium-sized image patches for image classification. Below you can find a continually updating list of vision transformers.
According to [1], ViT type models can be further categorized into uniform scale ViTs, multi-scale ViT, hybrid ViTs with convolutions, and self-supervised ViTs. The methods listed below provide a comprehensive overview of ViT models applied to a range of vision tasks.
Method | Year | Papers |
---|---|---|
2020 | 982 | |
2021 | 229 | |
2020 | 124 | |
2020 | 63 | |
2021 | 59 | |
2020 | 22 | |
2021 | 19 | |
2021 | 18 | |
2021 | 16 | |
2021 | 14 | |
2021 | 8 | |
2021 | 8 | |
2021 | 8 | |
2021 | 7 | |
2021 | 6 | |
2021 | 6 | |
2021 | 5 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 4 | |
2021 | 3 | |
2021 | 3 | |
2020 | 3 | |
2021 | 2 | |
2021 | 2 | |
2021 | 2 | |
2022 | 2 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 | |
2021 | 1 |