no code implementations • 22 Apr 2024 • Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang
This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
1 code implementation • 28 Jul 2023 • Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang
This work aims at decreasing the end-to-end generation latency of large language models (LLMs).
1 code implementation • 2 Feb 2023 • Junbo Zhao, Xuefei Ning, Enshu Liu, Binxin Ru, Zixuan Zhou, Tianchen Zhao, Chen Chen, Jiajin Zhang, Qingmin Liao, Yu Wang
In the first step, we train different sub-predictors on different types of available low-fidelity information to extract beneficial knowledge as low-fidelity experts.
1 code implementation • 16 Jul 2022 • Zixuan Zhou, Xuefei Ning, Yi Cai, Jiashu Han, Yiping Deng, Yuhan Dong, Huazhong Yang, Yu Wang
Specifically, we train the supernet with a large sharing extent (an easier curriculum) at the beginning and gradually decrease the sharing extent of the supernet (a harder curriculum).
no code implementations • 28 Sep 2020 • Xuefei Ning, Wenshuo Li, Zixuan Zhou, Tianchen Zhao, Shuang Liang, Yin Zheng, Huazhong Yang, Yu Wang
A major challenge in NAS is to conduct a fast and accurate evaluation of neural architectures.
1 code implementation • NeurIPS 2021 • Xuefei Ning, Changcheng Tang, Wenshuo Li, Zixuan Zhou, Shuang Liang, Huazhong Yang, Yu Wang
Conducting efficient performance estimations of neural architectures is a major challenge in neural architecture search (NAS).