Search Results for author: Benoit Dherin

Found 10 papers, 2 papers with code

Corridor Geometry in Gradient-Based Optimization

no code implementations13 Feb 2024 Benoit Dherin, Mihaela Rosca

We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines.

Implicit biases in multitask and continual learning from a backward error analysis perspective

no code implementations1 Nov 2023 Benoit Dherin

Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent.

Continual Learning

Morse Neural Networks for Uncertainty Quantification

no code implementations2 Jul 2023 Benoit Dherin, Huiyi Hu, Jie Ren, Michael W. Dusenberry, Balaji Lakshminarayanan

We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points.

Anomaly Detection One-class classifier +1

Deep Fusion: Efficient Network Training via Pre-trained Initializations

no code implementations20 Jun 2023 Hanna Mazzawi, Xavi Gonzalvo, Michael Wunder, Sammy Jerome, Benoit Dherin

Finally, we validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that with carefully optimized training dynamics, it significantly reduces both training time and resource consumption.

On a continuous time model of gradient descent dynamics and instability in deep learning

2 code implementations3 Feb 2023 Mihaela Rosca, Yan Wu, Chongli Qin, Benoit Dherin

The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization.

Why neural networks find simple solutions: the many regularizers of geometric complexity

no code implementations27 Sep 2022 Benoit Dherin, Michael Munn, Mihaela Rosca, David G. T. Barrett

Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.

The Geometric Occam's Razor Implicit in Deep Learning

no code implementations30 Nov 2021 Benoit Dherin, Michael Munn, David G. T. Barrett

We argue that over-parameterized neural networks trained with stochastic gradient descent are subject to a Geometric Occam's Razor; that is, these networks are implicitly regularized by the geometric model complexity.

Discretization Drift in Two-Player Games

3 code implementations28 May 2021 Mihaela Rosca, Yan Wu, Benoit Dherin, David G. T. Barrett

Gradient-based methods for two-player games produce rich dynamics that can solve challenging problems, yet can be difficult to stabilize and understand.

Vocal Bursts Valence Prediction

On the Origin of Implicit Regularization in Stochastic Gradient Descent

no code implementations ICLR 2021 Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

Implicit Gradient Regularization

no code implementations ICLR 2021 David G. T. Barrett, Benoit Dherin

We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization.

Cannot find the paper you are looking for? You can Submit a new open access paper.