1 code implementation • 15 Feb 2025 • Matteo Saponati, Pascal Sager, Pau Vilimelis Aceituno, Thilo Stadelmann, Benjamin Grewe
Self-attention is essential to Transformer architectures, yet how information is embedded in the self-attention matrices and how different objective functions impact this process remains unclear.