no code implementations • 29 Sep 2021 • Qibin Li, Nianmin Yao, Jian Zhao, Yanan Zhang
Based on the traditional attention mechanism, multi-scale fusion self attention extracts phrase information at different scales by setting convolution kernels at different levels, and calculates the corresponding attention matrix at different scales, so that the model can better extract phrase level information.