Training Tips for the Transformer Model

1 Apr 2018Martin PopelOndřej Bojar

This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers... (read more)

PDF Abstract

Evaluation results from the paper

  Submit results from this paper to get state-of-the-art GitHub badges and help community compare results to other papers.