1 code implementation • ACL 2018 • Julia Kreutzer, Joshua Uyheng, Stefan Riezler
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT).