Vision Transformers

Computer VisionImage Models • 45 methods

Vision Transformers are Transformer-like models applied to visual tasks. They stem from the work of ViT which directly applied a Transformer architecture on non-overlapping medium-sized image patches for image classification. Below you can find a continually updating list of vision transformers.

According to [1], ViT type models can be further categorized into uniform scale ViTs, multi-scale ViT, hybrid ViTs with convolutions, and self-supervised ViTs. The methods listed below provide a comprehensive overview of ViT models applied to a range of vision tasks.

[1] Transformers in Vision: A Survey

Method Year Papers
2020 1448
2021 297
2020 170
2021 108
2020 79
2020 28
2021 27
2021 24
2021 23
2021 18
2021 11
2021 10
2021 10
2021 9
2021 9
2021 8
2021 8
2021 7
2021 4
2021 4
2021 4
2021 4
2021 4
2021 3
2021 3
2021 3
2021 3
2020 3
2021 2
2021 2
2021 2
2021 2
2022 2
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1
2021 1