FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Machine Translation IWSLT2015 German-English FlowSeq-base BLEU score 24.75 # 12
Machine Translation WMT2014 English-German FlowSeq-base BLEU score 18.55 # 64
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-German FlowSeq-large (NPD n = 15) BLEU score 23.14 # 54
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-German FlowSeq-large BLEU score 20.85 # 58
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-German FlowSeq-large (IWD n = 15) BLEU score 22.94 # 55
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-German FlowSeq-large (NPD n = 30) BLEU score 23.64 # 53
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 German-English FlowSeq-large (NPD n = 30) BLEU score 28.29 # 4
Machine Translation WMT2014 German-English FlowSeq-base BLEU score 23.36 # 9
Machine Translation WMT2014 German-English FlowSeq-large BLEU score 25.4 # 8
Machine Translation WMT2014 German-English FlowSeq-large (NPD n = 15) BLEU score 27.71 # 5
Machine Translation WMT2014 German-English FlowSeq-large (IWD n=15) BLEU score 27.16 # 6
Machine Translation WMT2016 English-Romanian FlowSeq-large (NPD n = 30) BLEU score 32.35 # 3
Machine Translation WMT2016 English-Romanian FlowSeq-large (IWD n = 15) BLEU score 31.08 # 5
Machine Translation WMT2016 English-Romanian FlowSeq-large BLEU score 29.86 # 8
Machine Translation WMT2016 English-Romanian FlowSeq-base BLEU score 29.26 # 11
Machine Translation WMT2016 English-Romanian FlowSeq-large (NPD n=15) BLEU score 31.97 # 4
Machine Translation WMT2016 Romanian-English FlowSeq-large BLEU score 30.69 # 12
Machine Translation WMT2016 Romanian-English FlowSeq-large (NPD n = 30) BLEU score 32.91 # 7
Machine Translation WMT2016 Romanian-English FlowSeq-large (NPD n = 15) BLEU score 32.46 # 8
Machine Translation WMT2016 Romanian-English FlowSeq-large (IWD n = 15) BLEU score 32.03 # 9
Machine Translation WMT2016 Romanian-English FlowSeq-base BLEU score 30.16 # 14

Methods