1 code implementation • 20 May 2024 • Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse
While being computationally efficient compared to unrolling-based approaches, Source is suitable in cases where implicit-differentiation-based approaches struggle, such as in non-converged models and multi-stage training pipelines.
2 code implementations • 5 Feb 2024 • Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.
2 code implementations • 9 Dec 2023 • Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani
Second-order methods such as KFAC can be useful for neural net training.
1 code implementation • 20 Feb 2023 • Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations.
no code implementations • 22 Jul 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces.
no code implementations • 15 Feb 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Natural-gradient descent (NGD) on structured parameter spaces (e. g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations.
1 code implementation • ICML 2020 • Wu Lin, Mark Schmidt, Mohammad Emtiyaz Khan
The Bayesian learning rule is a natural-gradient variational inference method, which not only contains many existing learning algorithms as special cases but also enables the design of new algorithms.
1 code implementation • 29 Oct 2019 • Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt
Stein's method (Stein, 1973; 1981) is a powerful tool for statistical applications and has significantly impacted machine learning.
1 code implementation • 7 Jun 2019 • Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt
Natural-gradient methods enable fast and simple algorithms for variational inference, but due to computational difficulties, their use is mostly limited to \emph{minimal} exponential-family (EF) approximations.
3 code implementations • ICML 2018 • Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
Uncertainty computation in deep learning is essential to design robust and reliable systems.
1 code implementation • ICLR 2018 • Wu Lin, Nicolas Hubacher, Mohammad Emtiyaz Khan
Recent efforts on combining deep models with probabilistic graphical models are promising in providing flexible models that are also easy to interpret.
no code implementations • 15 Nov 2017 • Mohammad Emtiyaz Khan, Wu Lin, Voot Tangkaratt, Zuozhu Liu, Didrik Nielsen
We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning.
2 code implementations • 13 Mar 2017 • Mohammad Emtiyaz Khan, Wu Lin
In this paper, we propose a new algorithm called Conjugate-computation Variational Inference (CVI) which brings the best of the two worlds together -- it uses conjugate computations for the conjugate terms and employs stochastic gradients for the rest.
no code implementations • 31 Oct 2015 • Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama
We also give a convergence-rate analysis of our method and many other previous methods which exploit the geometry of the space.