Multi-branch Attentive Transformer

18 Jun 2020Yang FanShufang XieYingce XiaLijun WuTao QinXiang-Yang LiTie-Yan Liu

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks. In this work, we propose a simple yet effective variant of Transformer called multi-branch attentive Transformer (briefly, MAT), where the attention layer is the average of multiple branches and each branch is an independent multi-head attention layer... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Machine Translation WMT2014 English-German MAT BLEU score 30.8 # 3

Methods used in the Paper