no code implementations • 7 Dec 2024 • Juechu Dong, Boyuan Feng, Driss Guessous, Yanbo Liang, Horace He
We introduce FlexAttention, a novel compiler-driven programming model that allows implementing the majority of attention variants in a few lines of idiomatic PyTorch code.