no code implementations • 11 Jun 2021 • Jialin Dong, Da Zheng, Lin F. Yang, Geroge Karypis
This global cache allows in-GPU importance sampling of mini-batches, which drastically reduces the number of nodes in a mini-batch, especially in the input layer, to reduce data copy between CPU and GPU and mini-batch computation without compromising the training convergence rate or model accuracy.