Search Results for author: Aristide Baratin

Found 19 papers, 8 papers with code

Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees

1 code implementation12 Jul 2024 Alexia Jolicoeur-Martineau, Aristide Baratin, Kisoo Kwon, Boris Knyazev, Yan Zhang

Generating novel molecules is challenging, with most representations leading to generative models producing many invalid molecules.

Graph Generation Property Prediction +1

Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

no code implementations28 May 2024 Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli

Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations.


Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

no code implementations12 Mar 2024 Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.

Continual Learning Model Compression

Unsupervised Concept Discovery Mitigates Spurious Correlations

1 code implementation20 Feb 2024 Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi

Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.

Representation Learning

How connectivity structure shapes rich and lazy learning in neural circuits

no code implementations12 Oct 2023 Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie

Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks.

Lookbehind-SAM: k steps back, 1 step forward

1 code implementation31 Jul 2023 Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar

In this work, we increase the efficiency of the maximization and minimization parts of SAM's objective to achieve a better loss-sharpness trade-off.

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

1 code implementation18 Jul 2023 Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar

To address this, we propose a new memory-augmented version of Adam that encourages exploration towards flatter minima by incorporating a buffer of critical momentum terms during training.

Image Classification Language Modelling

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

no code implementations3 Dec 2022 JiHye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise.


Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

1 code implementation19 Sep 2022 Thomas George, Guillaume Lajoie, Aristide Baratin

Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization.

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

no code implementations2 Jun 2022 Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.

Domain Generalization Self-Supervised Learning

Learnability and Expressiveness in Self-Supervised Learning

no code implementations29 Sep 2021 Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville

In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.

Data Augmentation Self-Supervised Learning

On the Regularity of Attention

no code implementations10 Feb 2021 James Vuckovic, Aristide Baratin, Remi Tachet des Combes

Attention is a powerful component of modern neural networks across a wide variety of domains.

A Mathematical Theory of Attention

no code implementations6 Jul 2020 James Vuckovic, Aristide Baratin, Remi Tachet des Combes

Attention is a powerful component of modern neural networks across a wide variety of domains.

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

no code implementations19 Oct 2018 Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve.

Mutual Information Neural Estimation

no code implementations ICML 2018 Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.

General Classification

On the Spectral Bias of Neural Networks

2 code implementations ICLR 2019 Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

MINE: Mutual Information Neural Estimation

21 code implementations12 Jan 2018 Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.

General Classification

A3T: Adversarially Augmented Adversarial Training

no code implementations12 Jan 2018 Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien

Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier.

Adversarial Robustness BIG-bench Machine Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.