Search Results for author: Lin Qu

Found 6 papers, 1 papers with code

DDK: Distilling Domain Knowledge for Efficient Large Language Models

no code implementations23 Jul 2024 Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands.

Knowledge Distillation

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

no code implementations2 Jul 2024 Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Dakai An, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, YingHao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

To mitigate the high loading overhead of LoRA serving, SwiftDiffusion employs a bounded asynchronous LoRA loading (BAL) technique, allowing LoRA loading to overlap with the initial base model execution by up to k steps without compromising image quality.

Image Generation

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

no code implementations3 Jun 2024 Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes.

Math

GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Model

no code implementations23 May 2022 Wenbo Su, Yuanxing Zhang, Yufeng Cai, Kaixu Ren, Pengjie Wang, Huimin Yi, Yue Song, Jing Chen, Hongbo Deng, Jian Xu, Lin Qu, Bo Zheng

High-concurrency asynchronous training upon parameter server (PS) architecture and high-performance synchronous training upon all-reduce (AR) architecture are the most commonly deployed distributed training modes for recommendation models.

Recommendation Systems

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

1 code implementation11 Apr 2022 Yuanxing Zhang, Langshi Chen, Siran Yang, Man Yuan, Huimin Yi, Jie Zhang, Jiamang Wang, Jianbo Dong, Yunlong Xu, Yue Song, Yong Li, Di Zhang, Wei Lin, Lin Qu, Bo Zheng

However, we observe that GPU devices in training recommender systems are underutilized, and they cannot attain an expected throughput improvement as what it has achieved in CV and NLP areas.

Marketing Recommendation Systems

M6-T: Exploring Sparse Expert Models and Beyond

no code implementations31 May 2021 An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang

Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling.

Mixture-of-Experts Playing the Game of 2048

Cannot find the paper you are looking for? You can Submit a new open access paper.