A Practical PAC-Bayes Generalisation Bound for Deep Learning
Under a PAC-Bayesian framework, we derive an implementation efficient parameterisation invariant metric to measure the difference between our true and empirical risk. We show that for solutions of low training loss, this metric can be approximated at the same cost as a single step of SGD. We investigate the usefulness of this metric on pathological examples, where traditional Hessian based sharpness metrics increase but generalisation also increases and find good experimental agreement. As a consequence of our PAC-Bayesian framework and theoretical arguments on the effect of sub-sampling the Hessian, we include a trace of Hessian term into our structural risk. We find that this term promotes generalisation on a variety of experiments using Wide-Residual Networks on the CIFAR datasets.
PDF Abstract