1 code implementation • 23 Oct 2023 • Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng
Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality.
1 code implementation • 12 Mar 2023 • Zhengrui Ma, Chenze Shao, Shangtong Gui, Min Zhang, Yang Feng
Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem.
no code implementations • 30 Nov 2022 • Chenze Shao, Jinchao Zhang, Jie zhou, Yang Feng
In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output.
1 code implementation • 11 Oct 2022 • Chenze Shao, Zhengrui Ma, Yang Feng
Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency.
1 code implementation • 8 Oct 2022 • Chenze Shao, Yang Feng
We extend the alignment space to non-monotonic alignments to allow for the global word reordering and further consider all alignments that overlap with the target sentence.
1 code implementation • NAACL 2022 • Chenze Shao, Xuanfu Wu, Yang Feng
Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence.
1 code implementation • ACL 2022 • Chenze Shao, Yang Feng
The underlying cause is that training samples do not get balanced training in each model update, so we name this problem \textit{imbalanced training}.
1 code implementation • CL (ACL) 2021 • Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Jie zhou
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup through generating target words independently and simultaneously.
no code implementations • ACL 2021 • Yang Feng, Shuhao Gu, Dengji Guo, Zhengxin Yang, Chenze Shao
Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation.
no code implementations • 24 Apr 2021 • Yong Shan, Yang Feng, Chenze Shao
Non-Autoregressive Neural Machine Translation (NAT) has achieved significant inference speedup by generating all tokens simultaneously.
no code implementations • 1 Jan 2021 • Chenze Shao, Meng Sun, Yang Feng, Zhongjun He, Hua Wu, Haifeng Wang
Under this framework, we introduce word-level ensemble learning and sequence-level ensemble learning for neural machine translation, where sequence-level ensemble learning is capable of aggregating translation models with different decoding strategies.
no code implementations • EMNLP 2020 • Xuanfu Wu, Yang Feng, Chenze Shao
Despite the improvement of translation quality, neural machine translation (NMT) often suffers from the lack of diversity in its generation.
1 code implementation • 30 Nov 2019 • Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu
Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.
1 code implementation • 21 Nov 2019 • Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, Jie zhou
Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously.
3 code implementations • ACL 2019 • Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Xilin Chen, Jie zhou
Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model through discarding the autoregressive mechanism and generating target words independently, which fails to exploit the target sequential information.
1 code implementation • EMNLP 2018 • Chenze Shao, Yang Feng, Xilin Chen
Neural machine translation (NMT) models are usually trained with the word-level loss using the teacher forcing algorithm, which not only evaluates the translation improperly but also suffers from exposure bias.