Discovering Autoregressive Orderings with Variational Inference

ICLR 2021 · Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao ·

The predominant approach for language modeling is to encode a sequence of tokens from left to right, but this eliminates a source of information: the order by which the sequence was naturally generated. One strategy to recover this information is to decode both the content and location of tokens. Prior work supervises content and location with hand-designed loss functions or bootstraps from a predefined ordering. These approaches require domain-specific insight. We address this limitation with an unsupervised learner that discovers high-quality autoregressive orders without domain-specific prior. Our learner is a neural network that performs variational inference with the autoregressive order as a latent variable. The corresponding ELBO is not differentiable, so we develop a practical algorithm for end-to-end optimization using policy gradients. Strong empirical results with our solution on image captioning and code generation suggest that our algorithm is capable of discovering various autoregressive orders for different sequences that are competitive with or better than fixed orders.

PDF Abstract