no code implementations • 24 Oct 2024 • Liangdong Wang, Bo-Wen Zhang, ChengWei Wu, Hanyu Zhao, Xiaofeng Shi, Shuhao Gu, Jijie Li, Quanyue Ma, Tengfei Pan, Guang Liu
We present CCI3. 0-HQ (https://huggingface. co/datasets/BAAI/CCI3-HQ), a high-quality 500GB subset of the Chinese Corpora Internet 3. 0 (CCI3. 0)(https://huggingface. co/datasets/BAAI/CCI3-Data), developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality.
no code implementations • 1 Oct 2024 • Yiming Ju, Ziyi Ni, Xingrun Xing, Zhixiong Zeng, Hanyu Zhao, Siqi Fan, Zheng Zhang
Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks.
no code implementations • 11 Sep 2024 • Hanyu Zhao, Li Du, Yiming Ju, ChengWei Wu, Tengfei Pan
With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs).
1 code implementation • 13 Aug 2024 • Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang
In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.
no code implementations • 7 Jun 2024 • Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu
By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection.
no code implementations • 13 Feb 2024 • Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Zhang Zhang, Guangcan Liu, Liang Wang
At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy, injecting uncertainties into the model.
no code implementations • 30 Oct 2023 • Huiyao Shu, Ang Wang, Ziji Shi, Hanyu Zhao, Yong Li, Lu Lu
However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques.
no code implementations • 7 Dec 2022 • Yinpeng Dong, Peng Chen, Senyou Deng, Lianji L, Yi Sun, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yangyi Dong, Enhui Xu, Jincai Xu, Shu Xu, Xuelin Fu, Changfeng Sun, Haoliang Han, Xuchong Zhang, Shen Chen, Zhimin Sun, Junyi Cao, Taiping Yao, Shouhong Ding, Yu Wu, Jian Lin, Tianpeng Wu, Ye Wang, Yu Fu, Lin Feng, Kangkang Gao, Zeyu Liu, Yuanzhe Pang, Chengqi Duan, Huipeng Zhou, Yajie Wang, Yuhang Zhao, Shangbo Wu, Haoran Lyu, Zhiyu Lin, YiFei Gao, Shuang Li, Haonan Wang, Jitao Sang, Chen Ma, Junhao Zheng, Yijia Li, Chao Shen, Chenhao Lin, Zhichao Cui, Guoshuai Liu, Huafeng Shi, Kun Hu, Mengxin Zhang
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems.
no code implementations • 4 Jun 2022 • Yuezihan Jiang, Hao Yang, Junyang Lin, Hanyu Zhao, An Yang, Chang Zhou, Hongxia Yang, Zhi Yang, Bin Cui
Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks.
no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.
no code implementations • 22 Mar 2022 • Sha Yuan, Shuai Zhao, Jiahong Leng, Zhao Xue, Hanyu Zhao, Peiyu Liu, Zheng Gong, Wayne Xin Zhao, Junyi Li, Jie Tang
The results show that WuDaoMM can be applied as an efficient dataset for VLPMs, especially for the model in text-to-image generation task.
1 code implementation • 20 Mar 2022 • Yuezihan Jiang, Yu Cheng, Hanyu Zhao, Wentao Zhang, Xupeng Miao, Yu He, Liang Wang, Zhi Yang, Bin Cui
We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs.
no code implementations • 15 Nov 2021 • Hanyu Zhao, Sha Yuan, Jiahong Leng, Xiang Pan, Guoqiang Wang, Ledell Wu, Jie Tang
Knowledge Base Question Answering (KBQA) aims to answer natural language questions with the help of an external knowledge base.
no code implementations • 21 Nov 2019 • Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai
In this paper, we propose a general training framework named multi-self-distillation learning (MSD), which mining knowledge of different classifiers within the same network and increase every classifier accuracy.