no code implementations • 24 May 2023 • Linxuan Pan, Shenghui Song
However, existing analysis failed to explain why the multiple local updates with small mini-batches of data (L-SGD) can not be replaced by the update with one big batch of data and a larger learning rate (SGD).