no code implementations • 18 Mar 2020 • Xinjie Feng, Hongxun Yao, Yuankai Qi, Jun Zhang, Shengping Zhang
Different from previous transformer based models [56, 34], which just use the decoder of the transformer to decode the convolutional attention, the proposed method use a convolutional feature maps as word embedding input into transformer.