1 code implementation • 10 Oct 2024 • Maxwell Horton, Qingqing Cao, Chenfan Sun, Yanzi Jin, Sachin Mehta, Mohammad Rastegari, Moin Nabi
In our method, a small auxiliary model is used to process the prompt and produce an approximation of the KV cache used by a base model.
4 code implementations • 22 Apr 2024 • Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari
To this end, we release OpenELM, a state-of-the-art open language model.