1 code implementation • 1 Feb 2024 • Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge
In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.
Language Modelling