no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Chao Yang, Yuechuan Li, Ru He, Jingqiao Zhang
In BERT training, the backward computation is much more time-consuming than the forward computation, especially in the distributed training setting in which the backward computation time further includes the communication time for gradient synchronization.
no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Yuechuan Li, Chao Yang, Ming Yan, Jingqiao Zhang, Fangquan Lin
In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model.
no code implementations • 24 Jan 2018 • Kui Zhao, Yuechuan Li, Chi Zhang, Cheng Yang, Huan Xu
By leveraging the mixture layer, the proposed method can adaptively update states according to the similarities between encoded inputs and prototype vectors, leading to a stronger capacity in assimilating sequences with multiple patterns.
no code implementations • 22 Dec 2017 • Kui Zhao, Yuechuan Li, Zhaoqian Shuai, Cheng Yang
Many machine intelligence techniques are developed in E-commerce and one of the most essential components is the representation of IDs, including user ID, item ID, product ID, store ID, brand ID, category ID etc.