1 code implementation • 10 May 2022 • Mingrui Liu, Zhenxun Zhuang, Yunwei Lei, Chunyang Liao
Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup.