no code implementations • 24 Oct 2024 • Liangdong Wang, Bo-Wen Zhang, ChengWei Wu, Hanyu Zhao, Xiaofeng Shi, Shuhao Gu, Jijie Li, Quanyue Ma, Tengfei Pan, Guang Liu
We present CCI3. 0-HQ (https://huggingface. co/datasets/BAAI/CCI3-HQ), a high-quality 500GB subset of the Chinese Corpora Internet 3. 0 (CCI3. 0)(https://huggingface. co/datasets/BAAI/CCI3-Data), developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality.
4 code implementations • 24 Oct 2024 • Shuhao Gu, Jialing Zhang, Siyuan Zhou, Kevin Yu, Zhaohu Xing, Liangdong Wang, Zhou Cao, Jintao Jia, Zhuoyi Zhang, YiXuan Wang, Zhenchong Hu, Bo-Wen Zhang, Jijie Li, Dong Liang, Yingli Zhao, Yulong Ao, Yaoqi Liu, Fangxiang Feng, Guang Liu
Vision-Language Models (VLMs) have recently made significant progress, but the limited scale and quality of open-source instruction data hinder their performance compared to closed-source models.
no code implementations • 6 Oct 2024 • Shuhao Gu, Mengdi Zhao, BoWen Zhang, Liangdong Wang, Jijie Li, Guang Liu
In this work, we propose a method to improve model representation and processing efficiency by replacing the tokenizers of LLMs.
2 code implementations • 14 Aug 2024 • Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu
This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion.
1 code implementation • 13 Aug 2024 • Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang
In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.
1 code implementation • 2 Aug 2024 • Mengyu Bu, Shuhao Gu, Yang Feng
To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation.
no code implementations • 20 Nov 2023 • Zhuocheng Zhang, Shuhao Gu, Min Zhang, Yang Feng
To solve the length bias problem, we propose to improve the DNMT model in training method, attention mechanism, and decoding strategy.
1 code implementation • 17 Oct 2023 • Langlin Huang, Shuhao Gu, Zhuocheng Zhang, Yang Feng
Conventional neural machine translation (NMT) models typically use subwords and words as the basic units for model input and comprehension.
1 code implementation • 3 Nov 2022 • Shuhao Gu, Bojie Hu, Yang Feng
Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively.
1 code implementation • 28 Oct 2022 • Shuhao Gu, Yang Feng
The many-to-many multilingual neural machine translation can translate between language pairs unseen during training, i. e., zero-shot translation.
1 code implementation • ACL 2021 • Wanying Xie, Yang Feng, Shuhao Gu, Dong Yu
Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages.
no code implementations • ACL 2021 • Yang Feng, Shuhao Gu, Dengji Guo, Zhengxin Yang, Chenze Shao
Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.
1 code implementation • NAACL 2021 • Shuhao Gu, Yang Feng, Wanying Xie
Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both the general-domain and in-domain.
no code implementations • COLING 2020 • Shuhao Gu, Yang Feng
The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation.
1 code implementation • EMNLP 2020 • Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie zhou, Dong Yu
The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution.
no code implementations • WS 2020 • Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen
In this paper, we propose a method to handle the two problems so as to generate robust translation to ASR errors.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 30 Nov 2019 • Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu
Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.
no code implementations • IJCNLP 2019 • Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, Jie zhou
Context modeling is essential to generate coherent and consistent translation for Document-level Neural Machine Translations.
no code implementations • 31 Aug 2019 • Shuhao Gu, Yang Feng
Multi-head attention advances neural machine translation by working out multiple versions of attention in different subspaces, but the neglect of semantic overlapping between subspaces increases the difficulty of translation and consequently hinders the further improvement of translation performance.
no code implementations • NAACL 2019 • Shuhao Gu, Yang Feng, Qun Liu
Besides, we add a discriminator to the shared encoder and employ adversarial training for the whole model to reinforce the performance of information separation and machine translation simultaneously.