Search Results for author: Frederik Kunstner

Found 9 papers, 5 papers with code

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

no code implementations29 Feb 2024 Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti

We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics.

Language Modelling

Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

no code implementations12 Nov 2021 Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way.

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

no code implementations2 Nov 2020 Frederik Kunstner, Raunak Kumar, Mark Schmidt

In this work we first show that for the common setting of exponential family distributions, viewing EM as a mirror descent algorithm leads to convergence rates in Kullback-Leibler (KL) divergence.

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

no code implementations28 Sep 2020 Sharan Vaswani, Issam H. Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

Under an interpolation assumption, we prove that AMSGrad with a constant step-size and momentum can converge to the minimizer at the faster $O(1/T)$ rate for smooth, convex functions.

Binary Classification

BackPACK: Packing more into backprop

1 code implementation ICLR 2020 Felix Dangel, Frederik Kunstner, Philipp Hennig

Automatic differentiation frameworks are optimized for exactly one thing: computing the average mini-batch gradient.

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

1 code implementation NeurIPS 2019 Frederik Kunstner, Lukas Balles, Philipp Hennig

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information.

Second-order methods

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

2 code implementations NeurIPS 2018 Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution.

Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.