Optimization

Natural Gradient Descent is an approximate second-order optimisation method. It has an interpretation as optimizing over a Riemannian manifold using an intrinsic distance metric, which implies the updates are invariant to transformations such as whitening. By using the positive semi-definite (PSD) Gauss-Newton matrix to approximate the (possibly negative definite) Hessian, NGD can often work better than exact second-order methods.

Given the gradient of $z$, $g = \frac{\delta{f}\left(z\right)}{\delta{z}}$, NGD computes the update as:

$$\Delta{z} = \alpha{F}^{−1}g$$

where the Fisher information matrix $F$ is defined as:

$$F = \mathbb{E}_{p\left(t\mid{z}\right)}\left[\nabla\ln{p}\left(t\mid{z}\right)\nabla\ln{p}\left(t\mid{z}\right)^{T}\right]$$

The log-likelihood function $\ln{p}\left(t\mid{z}\right)$ typically corresponds to commonly used error functions such as the cross entropy loss.

Source: LOGAN

#### Papers

Paper Code Results Date Stars