no code implementations • 25 Sep 2019 • Prabhu Teja S*, Florian Mai*, Thijs Vogels, Martin Jaggi, Francois Fleuret
There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD.