Search Results for author: Hanyu Zhao

Found 14 papers, 2 papers with code

CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models

no code implementations24 Oct 2024 Liangdong Wang, Bo-Wen Zhang, ChengWei Wu, Hanyu Zhao, Xiaofeng Shi, Shuhao Gu, Jijie Li, Quanyue Ma, Tengfei Pan, Guang Liu

We present CCI3. 0-HQ (https://huggingface. co/datasets/BAAI/CCI3-HQ), a high-quality 500GB subset of the Chinese Corpora Internet 3. 0 (CCI3. 0)(https://huggingface. co/datasets/BAAI/CCI3-Data), developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality.

Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging

no code implementations1 Oct 2024 Yiming Ju, Ziyi Ni, Xingrun Xing, Zhixiong Zeng, Hanyu Zhao, Siqi Fan, Zheng Zhang

Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks.

Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency

no code implementations11 Sep 2024 Hanyu Zhao, Li Du, Yiming Ju, ChengWei Wu, Tengfei Pan

With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs).

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

1 code implementation13 Aug 2024 Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang

In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.

Language Modelling Transfer Learning

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

no code implementations7 Jun 2024 Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection.

Anomaly Detection

Variational Continual Test-Time Adaptation

no code implementations13 Feb 2024 Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Zhang Zhang, Guangcan Liu, Liang Wang

At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy, injecting uncertainties into the model.

Test-time Adaptation Variational Inference

ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout

no code implementations30 Oct 2023 Huiyao Shu, Ang Wang, Ziji Shi, Hanyu Zhao, Yong Li, Lu Lu

However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques.

Instance-wise Prompt Tuning for Pretrained Language Models

no code implementations4 Jun 2022 Yuezihan Jiang, Hao Yang, Junyang Lin, Hanyu Zhao, An Yang, Chang Zhou, Hongxia Yang, Zhi Yang, Bin Cui

Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks.

WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models

no code implementations22 Mar 2022 Sha Yuan, Shuai Zhao, Jiahong Leng, Zhao Xue, Hanyu Zhao, Peiyu Liu, Zheng Gong, Wayne Xin Zhao, Junyi Li, Jie Tang

The results show that WuDaoMM can be applied as an efficient dataset for VLPMs, especially for the model in text-to-image generation task.

Image Captioning Question Answering +2

ZOOMER: Boosting Retrieval on Web-scale Graphs by Regions of Interest

1 code implementation20 Mar 2022 Yuezihan Jiang, Yu Cheng, Hanyu Zhao, Wentao Zhang, Xupeng Miao, Yu He, Liang Wang, Zhi Yang, Bin Cui

We introduce ZOOMER, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs.

Retrieval

Calculating Question Similarity is Enough: A New Method for KBQA Tasks

no code implementations15 Nov 2021 Hanyu Zhao, Sha Yuan, Jiahong Leng, Xiang Pan, Guoqiang Wang, Ledell Wu, Jie Tang

Knowledge Base Question Answering (KBQA) aims to answer natural language questions with the help of an external knowledge base.

Entity Linking Knowledge Base Question Answering +3

MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks

no code implementations21 Nov 2019 Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai

In this paper, we propose a general training framework named multi-self-distillation learning (MSD), which mining knowledge of different classifiers within the same network and increase every classifier accuracy.

Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.