Learning to Use Future Information in Simultaneous Translation

1 Jan 2021 · Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu ·

Simultaneous neural machine translation (briefly, NMT) has attracted much attention recently. In contrast to standard NMT, where the NMT system can access the full input sentence, simultaneous NMT is a prefix-to-prefix problem, where the system can only utilize the prefix of the input sentence and thus more uncertainty and difficulty are introduced to decoding. Wait-k inference is a simple yet effective strategy for simultaneous NMT, where the decoder generates the output sequence $k$ words behind the input words. For wait-k inference, we observe that wait-m training with $m>k$ in simultaneous NMT (i.e., using more future information for training than inference) generally outperforms wait-k training. Based on this observation, we propose a method that automatically learns how much future information to use in training for simultaneous NMT. Specifically, we introduce a controller to adaptively select wait-m training strategies according to the network status of the translation model and current training sentence pairs, and the controller is jointly trained with the translation model through bi-level optimization. Experiments on four datasets show that our method brings 1 to 3 BLEU point improvement over baselines under the same latency. Our code is available at https://github.com/P2F-research/simulNMT .

PDF Abstract