no code implementations • 17 May 2022 • Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph Najnudel, Diego Granziol
This paper considers several aspects of random matrix universality in deep neural networks.
no code implementations • 29 Sep 2021 • Diego Granziol, Mingtian Zhang, Nicholas Baskerville
Under a PAC-Bayesian framework, we derive an implementation efficient parameterisation invariant metric to measure the difference between our true and empirical risk.
1 code implementation • 12 Feb 2021 • Nicholas P Baskerville, Diego Granziol, Jonathan P Keating
We further investigate the importance of the true loss surface in neural networks and find, in contrast to previous work, that the exponential hardness of locating the global minimum has practical consequences for achieving state of the art performance.
no code implementations • 1 Jan 2021 • Diego Granziol
Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation.
no code implementations • 15 Nov 2020 • Diego Granziol, Nicholas Baskerville
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise in the flattest directions of the true loss surface.
1 code implementation • 16 Jun 2020 • Diego Granziol, Stefan Zohren, Stephen Roberts
Whilst the linear scaling for stochastic gradient descent has been derived under more restrictive conditions, which we generalise, the square root scaling rule for adaptive optimisers is, to our knowledge, completely novel.
no code implementations • 16 Jun 2020 • Diego Granziol
Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation.
no code implementations • 13 Jun 2020 • Diego Granziol
We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities.
no code implementations • 2 Mar 2020 • Diego Granziol, Xingchen Wan, Samuel Albanie, Stephen Roberts
We analyse and explain the increased generalisation performance of iterate averaging using a Gaussian process perturbation model between the true and batch risk surface on the high dimensional quadratic.
no code implementations • ICLR 2020 • Diego Granziol, Timur Garipov, Dmitry Vetrov, Stefan Zohren, Stephen Roberts, Andrew Gordon Wilson
This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.
1 code implementation • 20 Dec 2019 • Diego Granziol, Xingchen Wan, Timur Garipov
We present MLRG Deep Curvature suite, a PyTorch-based, open-source package for analysis and visualisation of neural network curvature and loss landscape.
no code implementations • 19 Dec 2019 • Diego Granziol, Robin Ru, Stefan Zohren, Xiaowen Dong, Michael Osborne, Stephen Roberts
Graph spectral techniques for measuring graph similarity, or for learning the cluster number, require kernel smoothing.
no code implementations • 3 Jun 2019 • Diego Granziol, Binxin Ru, Stefan Zohren, Xiaowen Doing, Michael Osborne, Stephen Roberts
Efficient approximation lies at the heart of large-scale machine learning problems.
no code implementations • 18 Apr 2018 • Diego Granziol, Binxin Ru, Stefan Zohren, Xiaowen Dong, Michael Osborne, Stephen Roberts
Graph spectra have been successfully used to classify network types, compute the similarity between graphs, and determine the number of communities in a network.
no code implementations • 21 Feb 2018 • Diego Granziol, Edward Wagstaff, Bin Xin Ru, Michael Osborne, Stephen Roberts
Evaluating the log determinant of a positive definite matrix is ubiquitous in machine learning.
1 code implementation • ICML 2018 • Binxin Ru, Mark McLeod, Diego Granziol, Michael A. Osborne
Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems.
no code implementations • 8 Sep 2017 • Diego Granziol, Stephen Roberts
The ability of many powerful machine learning algorithms to deal with large data sets without compromise is often hampered by computationally expensive linear algebra tasks, of which calculating the log determinant is a canonical example.
1 code implementation • 24 Apr 2017 • Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts
The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others.