no code implementations • 3 Feb 2025 • Benoit Dherin, Benny Avelin, Anders Karlsson, Hanna Mazzawi, Javier Gonzalvo, Michael Munn
Despite exceptional achievements, training neural networks remains computationally expensive and is often plagued by instabilities that can degrade convergence.
no code implementations • 8 Oct 2024 • Michael Munn, Susan Wei
Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers.
no code implementations • 28 May 2024 • Michael Munn, Benoit Dherin, Javier Gonzalvo
We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes.
no code implementations • 24 May 2024 • Michael Munn, Benoit Dherin, Javier Gonzalvo
Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models.
1 code implementation • 10 Feb 2023 • Ryan Gillard, Stephen Jonany, Yingjie Miao, Michael Munn, Connal de Souza, Jonathan Dungay, Chen Liang, David R. So, Quoc V. Le, Esteban Real
In this paper, we show that large efficiency gains can be obtained by employing a fast unified functional hash, especially through the functional equivalence caching technique, which we also present.
no code implementations • 27 Sep 2022 • Benoit Dherin, Michael Munn, Mihaela Rosca, David G. T. Barrett
Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
no code implementations • 30 Nov 2021 • Benoit Dherin, Michael Munn, David G. T. Barrett
We argue that over-parameterized neural networks trained with stochastic gradient descent are subject to a Geometric Occam's Razor; that is, these networks are implicitly regularized by the geometric model complexity.
2 code implementations • NeurIPS 2020 • Tianlin Xu, Li K. Wenliang, Michael Munn, Beatrice Acciaio
We introduce COT-GAN, an adversarial algorithm to train implicit generative models optimized for producing sequential data.