1 code implementation • 14 Mar 2024 • Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su
Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences.
no code implementations • 22 Dec 2023 • Shengnan Wang, Yi Li, Zhou Chen, Yunjie Yang
Three-dimensional electrical capacitance tomography (3D-ECT) has shown promise for visualizing industrial multiphase flows.
no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Chao Yang, Yuechuan Li, Ru He, Jingqiao Zhang
In BERT training, the backward computation is much more time-consuming than the forward computation, especially in the distributed training setting in which the backward computation time further includes the communication time for gradient synchronization.
no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Yuechuan Li, Chao Yang, Ming Yan, Jingqiao Zhang, Fangquan Lin
In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model.