no code implementations • 3 May 2024 • Shaoyuan Chen, Wencong Xiao, Yutong Lin, Mingxing Zhang, Yingdi Shan, Jinlei Jiang, Kang Chen, Yongwei Wu
To enhance the efficiency of LLM decoding, we introduce model-attention disaggregation.
Language Modeling Language Modelling +2