Search Results for author: Shengnan Wang

Found 4 papers, 1 papers with code

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

1 code implementation • 14 Mar 2024 • Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su

Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences.

Paper
Code

Digital twin-assisted three-dimensional electrical capacitance tomography for multiphase flow imaging

no code implementations • 22 Dec 2023 • Shengnan Wang, Yi Li, Zhou Chen, Yunjie Yang

Three-dimensional electrical capacitance tomography (3D-ECT) has shown promise for visualizing industrial multiphase flows.

Computational Efficiency

Paper
Add Code

Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup

no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Chao Yang, Yuechuan Li, Ru He, Jingqiao Zhang

In BERT training, the backward computation is much more time-consuming than the forward computation, especially in the distributed training setting in which the backward computation time further includes the communication time for gradient synchronization.

Paper
Add Code

CoRe: An Efficient Coarse-refined Training Framework for BERT

no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Yuechuan Li, Chao Yang, Ming Yan, Jingqiao Zhang, Fangquan Lin

In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.