no code implementations • ICLR Workshop EBM 2021 • Nick Bhattacharya, Neil Thomas, Roshan Rao, Justas Daupras, Peter K Koo, David Baker, Yun S. Song, Sergey Ovchinnikov
On the one hand, factored attention is a direct simplification of multihead scaled dot-product attention in the Transformer.