no code implementations • 14 Nov 2018 • Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin
Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods.
no code implementations • 1 Feb 2018 • Chien-Chih Wang, Kent Loong Tan, Chun-Ting Chen, Yu-Hsiang Lin, S. Sathiya Keerthi, Dhruv Mahajan, S. Sundararajan, Chih-Jen Lin
First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines.