3 code implementations • 22 May 2023 • Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai
Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference.
Decoder Language Modelling