Search Results for author: Aristide Baratin

Found 17 papers, 5 papers with code

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

no code implementations • 12 Mar 2024 • Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.

Continual Learning Model Compression

Paper
Add Code

Unsupervised Concept Discovery Mitigates Spurious Correlations

no code implementations • 20 Feb 2024 • Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi

Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.

Representation Learning

Paper
Add Code

How connectivity structure shapes rich and lazy learning in neural circuits

no code implementations • 12 Oct 2023 • Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie

Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks.

Paper
Add Code

Lookbehind-SAM: k steps back, 1 step forward

no code implementations • 31 Jul 2023 • Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar

Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and loss sharpness as a minimax objective.

Paper
Add Code

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

1 code implementation • 18 Jul 2023 • Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models.

Image Classification Language Modelling

Paper
Code

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

no code implementations • 3 Dec 2022 • JiHye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise.

Memorization

Paper
Add Code

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

1 code implementation • 19 Sep 2022 • Thomas George, Guillaume Lajoie, Aristide Baratin

Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization.

Paper
Code

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

no code implementations • 2 Jun 2022 • Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.

Domain Generalization Self-Supervised Learning

Paper
Add Code

Learnability and Expressiveness in Self-Supervised Learning

no code implementations • 29 Sep 2021 • Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville

In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.

Data Augmentation Self-Supervised Learning

Paper
Add Code

On the Regularity of Attention

no code implementations • 10 Feb 2021 • James Vuckovic, Aristide Baratin, Remi Tachet des Combes

Attention is a powerful component of modern neural networks across a wide variety of domains.

Paper
Add Code

Implicit Regularization via Neural Feature Alignment

1 code implementation • NeurIPS Workshop DL-IG 2020 • Aristide Baratin, Thomas George, César Laurent, R. Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien

We approach the problem of implicit regularization in deep learning from a geometrical viewpoint.

feature selection Model Compression

Paper
Code

A Mathematical Theory of Attention

no code implementations • 6 Jul 2020 • James Vuckovic, Aristide Baratin, Remi Tachet des Combes

Attention is a powerful component of modern neural networks across a wide variety of domains.

Paper
Add Code

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

no code implementations • 19 Oct 2018 • Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve.

Paper
Add Code

Mutual Information Neural Estimation

no code implementations • ICML 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.

General Classification

Paper
Add Code

On the Spectral Bias of Neural Networks

2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

Paper
Code

MINE: Mutual Information Neural Estimation

20 code implementations • 12 Jan 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.

General Classification

312

Paper
Code

A3T: Adversarially Augmented Adversarial Training

no code implementations • 12 Jan 2018 • Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien

Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier.

Adversarial Robustness BIG-bench Machine Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.