1 code implementation • 13 Dec 2024 • Aditya Vavre, Ethan He, Dennis Liu, Zijie Yan, June Yang, Nima Tajbakhsh, Ashwath Aithal
Scaling large language models (LLMs) significantly improves performance but comes with prohibitive computational costs.
Ranked #2 on
Multi-task Language Understanding
on MMLU
(using extra training data)
no code implementations • 10 Oct 2024 • Ethan He, Abhinav Khattar, Ryan Prenger, Vijay Korthikanti, Zijie Yan, Tong Liu, Shiqing Fan, Ashwath Aithal, Mohammad Shoeybi, Bryan Catanzaro
Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models.
no code implementations • 24 Oct 2019 • Zijie Yan
We let workers download model difference, instead of the global model, from the server, and the model difference information is also sparsified so that the information exchanged overhead is reduced by sparsifying the dual-way communication between the server and workers.