no code implementations • 19 Feb 2021 • Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.
no code implementations • 11 Aug 2020 • Shahar Azulay, Lior Raz, Amir Globerson, Tomer Koren, Yehuda Afek
HoldOut SGD first randomly selects a set of workers that use their private data in order to propose gradient updates.