Search Results for author: Felix Dangel

Found 8 papers, 5 papers with code

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

no code implementations5 Feb 2024 Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.

Second-order methods

On the Disconnect Between Theory and Practice of Overparametrized Neural Networks

no code implementations29 Sep 2023 Jonathan Wenger, Felix Dangel, Agustinus Kristiadi

Our empirical results demonstrate that this is not the case in optimization, uncertainty quantification or continual learning.

Continual Learning Uncertainty Quantification

Convolutions Through the Lens of Tensor Networks

no code implementations5 Jul 2023 Felix Dangel

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the generalization of theoretical and algorithmic ideas.

Tensor Networks

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

3 code implementations4 Jun 2021 Felix Dangel, Lukas Tatzel, Philipp Hennig

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks.

BackPACK: Packing more into backprop

1 code implementation ICLR 2020 Felix Dangel, Frederik Kunstner, Philipp Hennig

Automatic differentiation frameworks are optimized for exactly one thing: computing the average mini-batch gradient.

Modular Block-diagonal Curvature Approximations for Feedforward Architectures

1 code implementation5 Feb 2019 Felix Dangel, Stefan Harmeling, Philipp Hennig

We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian).

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.