1 code implementation • 23 Oct 2023 • Tong Zheng, Bei Li, Huiwen Bao, Weiqiao Shan, Tong Xiao, Jingbo Zhu
The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead.
Ranked #23 on Machine Translation on WMT2014 English-German
1 code implementation • 20 Dec 2022 • Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, Jingbo Zhu
In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms.