no code implementations • 29 Sep 2021 • Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville
In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.
no code implementations • 10 Feb 2021 • James Vuckovic, Aristide Baratin, Remi Tachet des Combes
Attention is a powerful component of modern neural networks across a wide variety of domains.
no code implementations • NeurIPS Workshop DL-IG 2020 • Aristide Baratin, Thomas George, César Laurent, R. Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien
We approach the problem of implicit regularization in deep learning from a geometrical viewpoint.
no code implementations • 6 Jul 2020 • James Vuckovic, Aristide Baratin, Remi Tachet des Combes
Attention is a powerful component of modern neural networks across a wide variety of domains.
no code implementations • 19 Oct 2018 • Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas
The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve.
no code implementations • ICML 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.
18 code implementations • 12 Jan 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
no code implementations • 12 Jan 2018 • Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien
Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier.