Search Results for author: Zijie Yan

Found 3 papers, 1 papers with code

Llama 3 Meets MoE: Efficient Upcycling

1 code implementation13 Dec 2024 Aditya Vavre, Ethan He, Dennis Liu, Zijie Yan, June Yang, Nima Tajbakhsh, Ashwath Aithal

Scaling large language models (LLMs) significantly improves performance but comes with prohibitive computational costs.

Ranked #2 on Multi-task Language Understanding on MMLU (using extra training data)

MMLU Multi-task Language Understanding

Upcycling Large Language Models into Mixture of Experts

no code implementations10 Oct 2024 Ethan He, Abhinav Khattar, Ryan Prenger, Vijay Korthikanti, Zijie Yan, Tong Liu, Shiqing Fan, Ashwath Aithal, Mohammad Shoeybi, Bryan Catanzaro

Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models.

MMLU

Gradient Sparification for Asynchronous Distributed Training

no code implementations24 Oct 2019 Zijie Yan

We let workers download model difference, instead of the global model, from the server, and the model difference information is also sparsified so that the information exchanged overhead is reduced by sparsifying the dual-way communication between the server and workers.

Federated Learning Stochastic Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.