no code implementations • 20 Sep 2024 • Qiaozhi He, Xiaomin Zhuang, Zhihua Wu
This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices.
no code implementations • 30 Aug 2024 • Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Jie Gao, Wenlai Zhao, Hongkun Yu, Zhihua Wu, Xiaonan Zhang, Xiaomin Fang
The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods.
no code implementations • 8 May 2024 • Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu
In this report, we present ChuXin, an entirely open-source language model with a size of 1. 6 billion parameters.
no code implementations • 28 Apr 2024 • Qiaozhi He, Zhihua Wu
Large Language Models(LLMs) have had a profound impact on AI applications, particularly in the domains of long-text comprehension and generation.
no code implementations • 28 Mar 2024 • Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu
We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors.
no code implementations • IEEE Journal of Selected Topics in Signal Processing 2023 • Li Wang, Xin Wu, Yi Zhang, Xinyun Zhang, LianmingXu, Zhihua Wu, Aiguo Fei
Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module.
no code implementations • 7 Aug 2023 • Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang
Existing large language models have to run K times to generate a sequence of K tokens.
1 code implementation • 20 Feb 2023 • Chang Chen, Min Li, Zhihua Wu, dianhai yu, Chao Yang
In this paper, we propose TA-MoE, a topology-aware routing strategy for large-scale MoE trainging, from a model-system co-design perspective, which can dynamically adjust the MoE dispatch pattern according to the network topology.
1 code implementation • 12 Jul 2022 • Guoxia Wang, Xiaomin Fang, Zhihua Wu, Yiqun Liu, Yang Xue, Yingfei Xiang, dianhai yu, Fan Wang, Yanjun Ma
Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch.
1 code implementation • 19 May 2022 • Yang Xiang, Zhihua Wu, Weibao Gong, Siyu Ding, Xianjie Mo, Yuang Liu, Shuohuan Wang, Peng Liu, Yongshuai Hou, Long Li, Bin Wang, Shaohuai Shi, Yaqian Han, Yue Yu, Ge Li, Yu Sun, Yanjun Ma, dianhai yu
We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning.
2 code implementations • 31 Dec 2021 • Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7. 9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.
Ranked #41 on
Text-to-Image Generation
on MS COCO
3 code implementations • 23 Dec 2021 • Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen, Yuxiang Lu, Weixin Liu, Xi Wang, Yangfan Bai, Qiuliang Chen, Li Zhao, Shiyong Li, Peng Sun, dianhai yu, Yanjun Ma, Hao Tian, Hua Wu, Tian Wu, Wei Zeng, Ge Li, Wen Gao, Haifeng Wang
A unified framework named ERNIE 3. 0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters.
1 code implementation • 6 Dec 2021 • Yulong Ao, Zhihua Wu, dianhai yu, Weibao Gong, Zhiqing Kui, Minxu Zhang, Zilingfeng Ye, Liang Shen, Yanjun Ma, Tian Wu, Haifeng Wang, Wei Zeng, Chao Yang
The experiments demonstrate that our framework can satisfy various requirements from the diversity of applications and the heterogeneity of resources with highly competitive performance.
1 code implementation • 20 Nov 2021 • Ji Liu, Zhihua Wu, dianhai yu, Yanjun Ma, Danlei Feng, Minxu Zhang, Xinxuan Wu, Xuefeng Yao, Dejing Dou
The training process generally exploits distributed computing resources to reduce training time.
1 code implementation • WSDM 2021 • Wenhui Zhang, Zhihua Wu, Haofeng Yin
A quick start tool of search & recommendation algorithm based on PaddlePaddle A complete solution of recommendation system for beginners, developers and researchers.
3 code implementations • 20 Sep 2021 • Siqi Bao, Huang He, Fan Wang, Hua Wu, Haifeng Wang, Wenquan Wu, Zhihua Wu, Zhen Guo, Hua Lu, Xinxian Huang, Xin Tian, Xinchao Xu, Yingzhan Lin, Zheng-Yu Niu
To explore the limit of dialogue generation pre-training, we present the models of PLATO-XL with up to 11 billion parameters, trained on both Chinese and English social media conversations.
2 code implementations • 5 Jul 2021 • Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, dianhai yu, Hao Tian, Hua Wu, Haifeng Wang
We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
no code implementations • ICCV 2021 • Deng Huang, Wenhao Wu, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding
Specifically, we propose two tasks to learn the appearance and speed consistency, respectively.
1 code implementation • 27 Feb 2020 • Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, Zhihua Wu
Deploying deep learning models on mobile devices draws more and more attention recently.
1 code implementation • 18 Feb 2020 • Yikai Yan, Chaoyue Niu, Yucheng Ding, Zhenzhe Zheng, Fan Wu, Guihai Chen, Shaojie Tang, Zhihua Wu
In this work, we consider a practical and ubiquitous issue when deploying federated learning in mobile environments: intermittent client availability, where the set of eligible clients may change during the training process.
1 code implementation • 6 Nov 2019 • Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, Guihai Chen
Nevertheless, the "position" of a client's truly required submodel corresponds to her private data, and its disclosure to the cloud server during interactions inevitably breaks the tenet of federated learning.
no code implementations • 18 Sep 2019 • Renjie Gu, Chaoyue Niu, Fan Wu, Guihai Chen, Chun Hu, Chengfei Lyu, Zhihua Wu
Another benefit is the bandwidth reduction because various kinds of local data can be involved in the training process without being uploaded.