Second-order methods

50 papers with code • 0 benchmarks • 0 datasets

Use second-order statistics to process data.

Most implemented papers

Second-Order Stochastic Optimization for Machine Learning in Linear Time

brianbullins/lissa_code 12 Feb 2016

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity.

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

amirgholami/adahessian 1 Jun 2020

We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN.

Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods

Johanpdrsn/Newtonian-Monte-Carlo 15 Jan 2020

NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension.

Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization

tomoleary/hessianlearn 7 Feb 2020

In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation.

On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs

gmatilde/SGN 3 Jun 2020

This enables researchers to further study and improve this promising optimization technique and hopefully reconsider stochastic second-order methods as competitive optimization techniques for training DNNs; we also hope that the promise of SGN may lead to forward automatic differentiation being added to Tensorflow or Pytorch.

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

IST-DASLab/M-FAC NeurIPS 2021

We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian.

Near out-of-distribution detection for low-resolution radar micro-Doppler signatures

blupblupblup/doppler-signatures-generation 12 May 2022

We emphasize the relevance of OODD and its specific supervision requirements for the detection of a multimodal, diverse targets class among other similar radar targets and clutter in real-life critical systems.

Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC

f-dangel/singd 9 Dec 2023

Second-order methods such as KFAC can be useful for neural net training.

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

f-dangel/sirfshampoo 5 Feb 2024

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning

GCaptainNemo/optimization-project 30 Jun 2017

We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks.