1 code implementation • 10 Feb 2024 • Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci
Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks.
1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.
no code implementations • 1 Jan 2016 • Jerry Luo, Kayla Shapiro, Hao-Jun Michael Shi, Qi Yang, Kan Zhu
Motivated by non-negative matrix factorization, we reformulate our problem into a Frobenius norm minimization problem, which is solved by the Alternating Direction Method of Multipliers (ADMM) and develop an algorithm, FroMax.