1 code implementation • 17 Feb 2024 • Anxhelo Diko, Danilo Avola, Marco Cascio, Luigi Cinque
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features.
Ranked #491 on Image Classification on ImageNet