no code implementations • 19 Feb 2024 • Xi-Lin Li
This paper studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second order and adaptive gradient optimizers, e. g., BFGS, Gaussian-Newton and natural gradient descent, AdaGrad, etc.
1 code implementation • 7 Feb 2024 • Omead Pooladzandi, Xi-Lin Li
We present a novel approach to accelerate stochastic gradient descent (SGD) by utilizing curvature information obtained from Hessian-vector products or finite differences of parameters and gradients, similar to the BFGS algorithm.
1 code implementation • 23 Aug 2020 • Xi-Lin Li
This paper studies the density priors for independent vector analysis (IVA) with convolutive speech mixture separation as the exemplary application.
1 code implementation • 30 Apr 2020 • Xi-Lin Li
We report a triangular neural network implementation of neural autoregressive flow (NAF).
1 code implementation • 29 Nov 2018 • Xi-Lin Li
We study a multiclass multiple instance learning (MIL) problem where the labels only suggest whether any instance of a class exists or does not exist in a training sample or example.
2 code implementations • ICLR 2019 • Xi-Lin Li
We study two types of preconditioners and preconditioned stochastic gradient descent (SGD) methods in a unified framework.
1 code implementation • 26 Mar 2018 • Xi-Lin Li
This paper proposes a family of online second order methods for possibly non-convex stochastic optimizations based on the theory of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity simultaneously.
no code implementations • 14 Jun 2016 • Xi-Lin Li
This paper studies the performance of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on recurrent neural network (RNN) training.
2 code implementations • 14 Dec 2015 • Xi-Lin Li
When stochastic gradient is used, it can naturally damp the gradient noise to stabilize SGD.