no code implementations • 29 Sep 2021 • Diego Granziol, Mingtian Zhang, Nicholas Baskerville
Under a PAC-Bayesian framework, we derive an implementation efficient parameterisation invariant metric to measure the difference between our true and empirical risk.
no code implementations • 15 Nov 2020 • Diego Granziol, Nicholas Baskerville
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise in the flattest directions of the true loss surface.