Sharing Attention Weights for Fast Transformer

26 Jun 2019Tong XiaoYinqiao LiJingbo ZhuZhengtao YuTongran Liu

Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper