Search Results for author: Shuhao Gu

Found 20 papers, 11 papers with code

CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models

no code implementations24 Oct 2024 Liangdong Wang, Bo-Wen Zhang, ChengWei Wu, Hanyu Zhao, Xiaofeng Shi, Shuhao Gu, Jijie Li, Quanyue Ma, Tengfei Pan, Guang Liu

We present CCI3. 0-HQ (https://huggingface. co/datasets/BAAI/CCI3-HQ), a high-quality 500GB subset of the Chinese Corpora Internet 3. 0 (CCI3. 0)(https://huggingface. co/datasets/BAAI/CCI3-Data), developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality.

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

4 code implementations24 Oct 2024 Shuhao Gu, Jialing Zhang, Siyuan Zhou, Kevin Yu, Zhaohu Xing, Liangdong Wang, Zhou Cao, Jintao Jia, Zhuoyi Zhang, YiXuan Wang, Zhenchong Hu, Bo-Wen Zhang, Jijie Li, Dong Liang, Yingli Zhao, Yulong Ao, Yaoqi Liu, Fangxiang Feng, Guang Liu

Vision-Language Models (VLMs) have recently made significant progress, but the limited scale and quality of open-source instruction data hinder their performance compared to closed-source models.

Question Generation Question-Generation +1

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

no code implementations6 Oct 2024 Shuhao Gu, Mengdi Zhao, BoWen Zhang, Liangdong Wang, Jijie Li, Guang Liu

In this work, we propose a method to improve model representation and processing efficiency by replacing the tokenizers of LLMs.

Language Modelling Large Language Model

Aquila2 Technical Report

2 code implementations14 Aug 2024 Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu

This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion.

Management

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

1 code implementation13 Aug 2024 Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang

In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.

Language Modelling Transfer Learning

Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features

1 code implementation2 Aug 2024 Mengyu Bu, Shuhao Gu, Yang Feng

To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation.

Decoder Machine Translation +3

Addressing the Length Bias Problem in Document-Level Neural Machine Translation

no code implementations20 Nov 2023 Zhuocheng Zhang, Shuhao Gu, Min Zhang, Yang Feng

To solve the length bias problem, we propose to improve the DNMT model in training method, attention mechanism, and decoding strategy.

Machine Translation Translation

Enhancing Neural Machine Translation with Semantic Units

1 code implementation17 Oct 2023 Langlin Huang, Shuhao Gu, Zhuocheng Zhang, Yang Feng

Conventional neural machine translation (NMT) models typically use subwords and words as the basic units for model input and comprehension.

Machine Translation NMT +2

Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions

1 code implementation3 Nov 2022 Shuhao Gu, Bojie Hu, Yang Feng

Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively.

Continual Learning Domain Adaptation +2

Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings

1 code implementation28 Oct 2022 Shuhao Gu, Yang Feng

The many-to-many multilingual neural machine translation can translate between language pairs unseen during training, i. e., zero-shot translation.

Machine Translation Translation

Importance-based Neuron Allocation for Multilingual Neural Machine Translation

1 code implementation ACL 2021 Wanying Xie, Yang Feng, Shuhao Gu, Dong Yu

Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages.

General Knowledge Machine Translation +1

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

1 code implementation NAACL 2021 Shuhao Gu, Yang Feng, Wanying Xie

Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both the general-domain and in-domain.

Domain Adaptation Knowledge Distillation +2

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

no code implementations COLING 2020 Shuhao Gu, Yang Feng

The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation.

Domain Adaptation Machine Translation +2

Token-level Adaptive Training for Neural Machine Translation

1 code implementation EMNLP 2020 Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie zhou, Dong Yu

The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution.

Diversity Machine Translation +2

Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

1 code implementation30 Nov 2019 Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu

Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.

Machine Translation Translation

Improving Multi-Head Attention with Capsule Networks

no code implementations31 Aug 2019 Shuhao Gu, Yang Feng

Multi-head attention advances neural machine translation by working out multiple versions of attention in different subspaces, but the neglect of semantic overlapping between subspaces increases the difficulty of translation and consequently hinders the further improvement of translation performance.

Clustering Machine Translation +1

Improving Domain Adaptation Translation with Domain Invariant and Specific Information

no code implementations NAACL 2019 Shuhao Gu, Yang Feng, Qun Liu

Besides, we add a discriminator to the shared encoder and employ adversarial training for the whole model to reinforce the performance of information separation and machine translation simultaneously.

Decoder Domain Adaptation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.