no code implementations • ICML 2020 • Liu Liu, Lei Deng, Zhaodong Chen, yuke wang, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie
Using Deep Neural Networks (DNNs) in machine learning tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements and energy constraints because of the memory-bound and the compute-bound execution pattern of DNNs.
1 code implementation • 7 Mar 2024 • 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai
The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models.
Ranked #1 on
Chatbot
on AlpacaEval
no code implementations • 18 Oct 2021 • Zhe Zhou, Junlin Liu, Zhenyu Gu, Guangyu Sun
To enable such an algorithm with lower latency and better energy efficiency, we also propose an Energon co-processor architecture.
no code implementations • 9 Feb 2021 • Kang Zhao, Sida Huang, Pan Pan, Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu
Researches have demonstrated that low bit-width (e. g., INT8) quantization can be employed to accelerate the inference process.
no code implementations • 25 Sep 2019 • Liu Liu, Lei Deng, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie
Using Recurrent Neural Networks (RNNs) in sequence modeling tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements because of the memory-bound execution pattern of RNNs.