An Actor-Critic Algorithm for Sequence Prediction

24 Jul 2016Dzmitry Bahdanau • Philemon Brakel • Kelvin Xu • Anirudh Goyal • Ryan Lowe • Joelle Pineau • Aaron Courville • Yoshua Bengio

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network.

PDF Abstract


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation IWSLT2015 English-German RNNsearch BLEU score 25.04 # 7
Machine Translation IWSLT2015 German-English RNNsearch BLEU score 29.98 # 11