no code implementations • 7 Mar 2025 • Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, dingnan jin, Feng Yu, Feng Zhu, Feng Yuan, Fakang Wang, Gangshan Wang, Guangyao Zhai, HaiTao Zhang, Huizhong Li, Jun Zhou, Jia Liu, Junpeng Fang, Junjie Ou, Jun Hu, Ji Luo, Ji Zhang, Jian Liu, Jian Sha, Jianxue Qian, Jiewei Wu, Junping Zhao, Jianguo Li, Jubao Feng, Jingchao Di, Junming Xu, Jinghua Yao, Kuan Xu, Kewei Du, Longfei Li, Lei Liang, Lu Yu, Li Tang, Lin Ju, Peng Xu, Qing Cui, Song Liu, Shicheng Li, Shun Song, Song Yan, Tengwei Cai, Tianyi Chen, Ting Guo, Ting Huang, Tao Feng, Tao Wu, Wei Wu, Xiaolu Zhang, Xueming Yang, Xin Zhao, Xiaobo Hu, Xin Lin, Yao Zhao, Yilong Wang, Yongzhen Guo, Yuanyuan Wang, Yue Yang, Yang Cao, Yuhao Fu, Yi Xiong, Yanzhe Li, Zhe Li, Zhiqiang Zhang, Ziqi Liu, ZhaoXin Huan, Zujie Wen, Zhenhang Sun, Zhuoxuan Du, Zhengyu He
Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models.
no code implementations • 2 Mar 2025 • Hongzhi Luan, Changxin Tian, ZhaoXin Huan, Xiaolu Zhang, Kunlong Chen, Zhiqiang Zhang, Jun Zhou
To address these issues, we propose Base model Oriented Systematic Evaluation (BOSE), a method specifically designed to optimize the evaluation of base models.
no code implementations • 15 Apr 2024 • Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, ZhaoXin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou
Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice.
no code implementations • 28 Mar 2024 • Binzong Geng, ZhaoXin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, Linjian Mo
However, we argue that a critical obstacle remains in deploying LLMs for practical use: the efficiency of LLMs when processing long textual user behaviors.
no code implementations • 9 Jan 2024 • Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, ZhaoXin Huan, Lin Ju, Xiaolu Zhang, Lin Wang, Jun Zhou
However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster.
no code implementations • 22 Oct 2023 • Zuoli Tang, ZhaoXin Huan, Zihao Li, Xiaolu Zhang, Jun Hu, Chilin Fu, Jun Zhou, Chenliang Li
We expect that by mixing the user's behaviors across different domains, we can exploit the common knowledge encoded in the pre-trained language model to alleviate the problems of data sparsity and cold start problems.
no code implementations • 9 Oct 2023 • Chan Wu, Hanxiao Zhang, Lin Ju, Jinjing Huang, Youshao Xiao, ZhaoXin Huan, Siyuan Li, Fanzhuang Meng, Lei Liang, Xiaolu Zhang, Jun Zhou
In this paper, we rethink the impact of memory consumption and communication costs on the training speed of large language models, and propose a memory-communication balanced strategy set Partial Redundancy Optimizer (PaRO).
no code implementations • 31 Aug 2023 • ZhaoXin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang
3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items.
no code implementations • 3 Mar 2020 • ZhaoXin Huan, Yulong Wang, Xiaolu Zhang, Lin Shang, Chilin Fu, Jun Zhou
Adversarial examples often exhibit black-box attacking transferability, which allows that adversarial examples crafted for one model can fool another model.