no code implementations • 24 Nov 2023 • Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang
Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.