Natural Gradient Descent is an approximate secondorder optimisation method. It has an interpretation as optimizing over a Riemannian manifold using an intrinsic distance metric, which implies the updates are invariant to transformations such as whitening. By using the positive semidefinite (PSD) GaussNewton matrix to approximate the (possibly negative definite) Hessian, NGD can often work better than exact secondorder methods.
Given the gradient of $z$, $g = \frac{\delta{f}\left(z\right)}{\delta{z}}$, NGD computes the update as:
$$\Delta{z} = \alpha{F}^{−1}g$$
where the Fisher information matrix $F$ is defined as:
$$ F = \mathbb{E}_{p\left(t\mid{z}\right)}\left[\nabla\ln{p}\left(t\mid{z}\right)\nabla\ln{p}\left(t\mid{z}\right)^{T}\right] $$
The loglikelihood function $\ln{p}\left(t\mid{z}\right)$ typically corresponds to commonly used error functions such as the cross entropy loss.
Source: LOGAN
Image: Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
Paper  Code  Results  Date  Stars 

Task  Papers  Share 

Image Classification  7  19.44% 
Variational Monte Carlo  2  5.56% 
Federated Learning  2  5.56% 
Image Reconstruction  2  5.56% 
Bias Detection  2  5.56% 
Clustering  2  5.56% 
BIGbench Machine Learning  2  5.56% 
Computational Efficiency  2  5.56% 
Sequential Bayesian Inference  1  2.78% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 