no code implementations • 25 Jul 2024 • Sanae Lotfi, Yilun Kuang, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson
Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale.
1 code implementation • 28 Dec 2023 • Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora.
1 code implementation • 24 Nov 2022 • Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson
While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works.
1 code implementation • 23 Feb 2022 • Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
no code implementations • 29 Nov 2021 • Sanae Lotfi, Tiphaine Bonniot de Ruisselet, Dominique Orban, Andrea Lodi
In this paper, we consider both first- and second-order techniques to address continuous optimization problems arising in machine learning.
1 code implementation • NeurIPS 2021 • Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data.
1 code implementation • 25 Feb 2021 • Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
In this paper, we show that there are mode-connecting simplicial complexes that form multi-dimensional manifolds of low loss, connecting many independently trained models.
no code implementations • 10 Dec 2020 • Sanae Lotfi, Tiphaine Bonniot de Ruisselet, Dominique Orban, Andrea Lodi
We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and conditioning.