no code implementations • 20 Nov 2023 • Zhuocheng Zhang, Shuhao Gu, Min Zhang, Yang Feng
To solve the length bias problem, we propose to improve the DNMT model in training method, attention mechanism, and decoding strategy.
1 code implementation • 17 Oct 2023 • Langlin Huang, Shuhao Gu, Zhuocheng Zhang, Yang Feng
Conventional neural machine translation (NMT) models typically use subwords and words as the basic units for model input and comprehension.
1 code implementation • 3 Nov 2022 • Shuhao Gu, Bojie Hu, Yang Feng
Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively.
1 code implementation • 28 Oct 2022 • Shuhao Gu, Yang Feng
The many-to-many multilingual neural machine translation can translate between language pairs unseen during training, i. e., zero-shot translation.
1 code implementation • ACL 2021 • Wanying Xie, Yang Feng, Shuhao Gu, Dong Yu
Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages.
no code implementations • ACL 2021 • Yang Feng, Shuhao Gu, Dengji Guo, Zhengxin Yang, Chenze Shao
Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.
1 code implementation • NAACL 2021 • Shuhao Gu, Yang Feng, Wanying Xie
Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both the general-domain and in-domain.
no code implementations • COLING 2020 • Shuhao Gu, Yang Feng
The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation.
1 code implementation • EMNLP 2020 • Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie zhou, Dong Yu
The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution.
no code implementations • WS 2020 • Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen
In this paper, we propose a method to handle the two problems so as to generate robust translation to ASR errors.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 30 Nov 2019 • Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu
Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.
no code implementations • IJCNLP 2019 • Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, Jie zhou
Context modeling is essential to generate coherent and consistent translation for Document-level Neural Machine Translations.
no code implementations • 31 Aug 2019 • Shuhao Gu, Yang Feng
Multi-head attention advances neural machine translation by working out multiple versions of attention in different subspaces, but the neglect of semantic overlapping between subspaces increases the difficulty of translation and consequently hinders the further improvement of translation performance.
no code implementations • NAACL 2019 • Shuhao Gu, Yang Feng, Qun Liu
Besides, we add a discriminator to the shared encoder and employ adversarial training for the whole model to reinforce the performance of information separation and machine translation simultaneously.