no code implementations • 13 Nov 2023 • Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice.
no code implementations • 15 Jul 2023 • Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset.
no code implementations • 4 Jul 2023 • Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli
Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms.
no code implementations • 14 Oct 2022 • Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney
One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
no code implementations • 16 May 2022 • Feynman Liang, Liam Hodgkinson, Michael W. Mahoney
While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately.
1 code implementation • 6 Feb 2022 • Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney
Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics.
no code implementations • 2 Aug 2021 • Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood.
1 code implementation • NeurIPS 2021 • Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper.
3 code implementations • NeurIPS 2021 • Alejandro Queiruga, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney
The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems.
1 code implementation • NeurIPS 2021 • Soon Hoe Lim, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney
We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states.
1 code implementation • ICLR 2021 • N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney
Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity.
Ranked #10 on Sequential Image Classification on Sequential CIFAR-10
no code implementations • 11 Jun 2020 • Liam Hodgkinson, Michael W. Mahoney
Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear.
no code implementations • NeurIPS 2020 • Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney
We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs).
no code implementations • 25 Jan 2020 • Liam Hodgkinson, Robert Salomone, Fred Roosta
Stein importance sampling is a widely applicable technique based on kernelized Stein discrepancy, which corrects the output of approximate sampling algorithms by reweighting the empirical distribution of the samples.
no code implementations • 19 Jul 2019 • Rajiv Khanna, Liam Hodgkinson, Michael W. Mahoney
The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated.
no code implementations • 29 Mar 2019 • Liam Hodgkinson, Robert Salomone, Fred Roosta
Theoretical and algorithmic properties of the resulting sampling methods for $ \theta \in [0, 1] $ and a range of step sizes are established.