no code implementations • 19 Jan 2015 • Ru He, Jin Tian, Huaiqing Wu
We study the Bayesian model averaging approach to learning Bayesian network structures (DAGs) from data.
no code implementations • 27 Nov 2020 • Cheng Yang, Shengnan Wang, Chao Yang, Yuechuan Li, Ru He, Jingqiao Zhang
In BERT training, the backward computation is much more time-consuming than the forward computation, especially in the distributed training setting in which the backward computation time further includes the communication time for gradient synchronization.
1 code implementation • 14 Jun 2021 • Yifei Xu, Jingqiao Zhang, Ru He, Liangzhu Ge, Chao Yang, Cheng Yang, Ying Nian Wu
In this paper, we propose a self-augmentation strategy (SAS) where a single network is utilized for both regular pre-training and contextualized data augmentation for the training in later epochs.
no code implementations • 2 Jul 2022 • Chao Yang, Ru He, Fangquan Lin, Suoyuan Song, Jingqiao Zhang, Cheng Yang
Our goal is to build general representation (embedding) for each user and each product item across Alibaba's businesses, including Taobao and Tmall which are among the world's biggest e-commerce websites.