Natural Gradient Descent

Natural Gradient Descent is an approximate second-order optimisation method. It has an interpretation as optimizing over a Riemannian manifold using an intrinsic distance metric, which implies the updates are invariant to transformations such as whitening. By using the positive semi-definite (PSD) Gauss-Newton matrix to approximate the (possibly negative definite) Hessian, NGD can often work better than exact second-order methods.

Given the gradient of $z$, $g = \frac{\delta{f}\left(z\right)}{\delta{z}}$, NGD computes the update as:

$$\Delta{z} = \alpha{F}^{−1}g$$

where the Fisher information matrix $F$ is defined as:

$$ F = \mathbb{E}_{p\left(t\mid{z}\right)}\left[\nabla\ln{p}\left(t\mid{z}\right)\nabla\ln{p}\left(t\mid{z}\right)^{T}\right] $$

The log-likelihood function $\ln{p}\left(t\mid{z}\right)$ typically corresponds to commonly used error functions such as the cross entropy loss.

Source: LOGAN

Image: Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	7	21.88%
Variational Monte Carlo	2	6.25%
Federated Learning	2	6.25%
Image Reconstruction	2	6.25%
Bias Detection	2	6.25%
Clustering	2	6.25%
BIG-bench Machine Learning	2	6.25%
Computational Efficiency	2	6.25%
Machine Translation	1	3.13%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Optimization