no code implementations • 25 Sep 2019 • Abraham J. Fetterman, Christina H. Kim, Joshua Albrecht
Abstract Stochastic gradient descent (SGD) and Adam are commonly used to optimize deep neural networks, but choosing one usually means making tradeoffs between speed, accuracy and stability.