1 code implementation • 2 Apr 2024 • Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang
For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10.
1 code implementation • 28 Feb 2024 • Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang
Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs).
1 code implementation • 6 Feb 2024 • Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang
In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.
no code implementations • 10 Jan 2022 • Ruofan Liang, Bingsheng He, Shengen Yan, Peng Sun
Multi-tenant machine learning services have become emerging data-intensive workloads in data centers with heavy usage of GPU resources.
1 code implementation • 3 Sep 2021 • Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services in both the research community and industry.
1 code implementation • 19 Feb 2019 • Peng Sun, Wansen Feng, Ruobing Han, Shengen Yan, Yonggang Wen
To address this problem, we propose a communication backend named GradientFlow for distributed DNN training, and employ a set of network optimization techniques.
Distributed, Parallel, and Cluster Computing
no code implementations • 13 Jan 2015 • Ren Wu, Shengen Yan, Yi Shan, Qingqing Dang, Gang Sun
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning.