1 code implementation • 23 Oct 2023 • Ross M. Clarke, Baiyu Su, José Miguel Hernández-Lobato
Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC).