2 code implementations • 5 Feb 2024 • Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.
2 code implementations • 9 Dec 2023 • Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani
Second-order methods such as KFAC can be useful for neural net training.
no code implementations • NeurIPS 2023 • Runa Eschenhagen, Alexander Immer, Richard E. Turner, Frank Schneider, Philipp Hennig
In this work, we identify two different settings of linear weight-sharing layers which motivate two flavours of K-FAC -- $\textit{expand}$ and $\textit{reduce}$.
3 code implementations • 12 Jun 2023 • George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson
In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark.
1 code implementation • 17 Apr 2023 • Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Vincent Fortuin
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
no code implementations • 2 Aug 2022 • Emilia Magnani, Nicholas Krämer, Runa Eschenhagen, Lorenzo Rosasco, Philipp Hennig
Neural operators are a type of deep architecture that learns to solve (i. e. learns the nonlinear solution operator of) partial differential equations (PDEs).
1 code implementation • 20 May 2022 • Agustinus Kristiadi, Runa Eschenhagen, Philipp Hennig
We show that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.
no code implementations • 5 Nov 2021 • Runa Eschenhagen, Erik Daxberger, Philipp Hennig, Agustinus Kristiadi
Deep neural networks are prone to overconfident predictions on outliers.
5 code implementations • NeurIPS 2021 • Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig
Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection.
no code implementations • NeurIPS 2021 • Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig
Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection.
1 code implementation • NeurIPS 2020 • Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard E. Turner, Mohammad Emtiyaz Khan
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past.
1 code implementation • NeurIPS 2019 • Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan
Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted.