no code implementations • 4 Mar 2025 • Vaibhav Singh, Paul Janson, Paria Mehrbod, Adam Ibrahim, Irina Rish, Eugene Belilovsky, Benjamin Thérien
Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.
1 code implementation • 22 Jan 2025 • Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky
Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned update rules that enable faster, hyperparameter-free training of neural networks.
no code implementations • 20 Dec 2024 • Albert Manuel Orozco Camacho, Stefan Horoi, Guy Wolf, Eugene Belilovsky
Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings.
no code implementations • 19 Nov 2024 • Tianhao Xie, Noam Aigerman, Eugene Belilovsky, Tiberiu Popa
3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and computer vision.
no code implementations • 19 Nov 2024 • Paul Janson, Tiberiu Popa, Eugene Belilovsky
Specifically we propose to synthesize human motion by deforming an SMPL-X body representation guided by Score distillation sampling (SDS) calculated using a video diffusion model.
no code implementations • 23 Sep 2024 • Humza Wajid Hameed, Geraldin Nanfack, Eugene Belilovsky
It has been recently shown that a powerful approach to combat spurious correlations is to re-train the last layer on a balanced validation dataset, isolating robust features for the predictor.
1 code implementation • 6 Sep 2024 • Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie, Eugene Belilovsky, Simon Lacoste-Julien
Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e. g. Adam).
1 code implementation • 7 Jul 2024 • Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, Guy Wolf
We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features.
no code implementations • 19 Jun 2024 • Vaibhav Singh, Rahaf Aljundi, Eugene Belilovsky
Foundational vision-language models have shown impressive performance on various downstream tasks.
1 code implementation • 4 Jun 2024 • Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling.
no code implementations • 3 Jun 2024 • Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, Edouard Oyallon
Our method relies on a novel technique to mitigate the one-step delay inherent in parallel execution of gradient computations and communications, eliminating the need for warmup steps and aligning with the training dynamics of standard distributed optimization while converging faster in terms of wall-clock time.
no code implementations • 3 Jun 2024 • Geraldin Nanfack, Michael Eickenberg, Eugene Belilovsky
Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applications.
1 code implementation • 31 May 2024 • Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky
We extend $\mu$P theory to learned optimizers, treating the meta-training problem as finding the learned optimizer under $\mu$P.
no code implementations • 27 May 2024 • Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, Edouard Oyallon
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models.
1 code implementation • 26 May 2024 • Damien Martins Gomes, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, Mahdi S. Hosseini
However, their practicality in training DNNs is still limited due to increased per-iteration computations compared to the first-order methods.
1 code implementation • 13 Mar 2024 • Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish
In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.
no code implementations • 7 Feb 2024 • Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky
However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure.
no code implementations • 11 Dec 2023 • MohammadReza Davari, Eugene Belilovsky
These breadcrumbs are constructed by subtracting the weights from a pre-trained model before and after fine-tuning, followed by a sparsification process that eliminates weight outliers and negligible perturbations.
1 code implementation • 2 Dec 2023 • Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning.
no code implementations • 6 Oct 2023 • Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa
Thus, our deformation method achieves globally realistic shape deformation which is not restricted to any class of objects.
2 code implementations • 8 Aug 2023 • Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort
We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.
1 code implementation • NeurIPS 2023 • Adel Nabli, Eugene Belilovsky, Edouard Oyallon
Distributed training of Deep Learning models has been critical to many recent successes in the field.
1 code implementation • 12 Jun 2023 • Louis Fournier, Stéphane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements.
no code implementations • 12 Jun 2023 • Geraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael Eickenberg, Eugene Belilovsky
These inputs can be selected from a data set or obtained by optimization.
no code implementations • 11 Apr 2023 • Gwen Legate, Lucas Caccia, Eugene Belilovsky
In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation.
1 code implementation • CVPR 2023 • AmirMohammad Sarfi, Zahra Karimpour, Muawiz Chaudhary, Nasir M. Khalid, Mirco Ravanelli, Sudhir Mudur, Eugene Belilovsky
Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers.
1 code implementation • 26 Mar 2023 • Nader Asadi, MohammadReza Davari, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky
Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point.
no code implementations • 13 Feb 2023 • Medric Sonwa, Johanna Hansen, Eugene Belilovsky
In this paper, we adopt a challenging, but more realistic problem formulation, learning control policies that operate on a learned latent space with access only to visual demonstrations of an expert completing a task.
1 code implementation • 18 Jan 2023 • Adeetya Patel, Michael Eickenberg, Eugene Belilovsky
Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup and utilizes local objective functions to permit parallel learning amongst model components in a deep network.
no code implementations • 28 Oct 2022 • MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways.
2 code implementations • 24 Mar 2022 • Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, Tiberiu Popa
We present a technique for zero-shot generation of a 3D model using only a target text prompt.
no code implementations • 24 Mar 2022 • Nader Asadi, Sudhir Mudur, Eugene Belilovsky
Recent work studies the supervised online continual learning setting where a learner receives a stream of data whose class distribution changes over time.
no code implementations • CVPR 2022 • MohammadReza Davari, Nader Asadi, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky
Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks.
2 code implementations • ICLR 2022 • Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky
In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones.
1 code implementation • 31 Jan 2022 • Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio
As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks.
no code implementations • 28 Jan 2022 • Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky
A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms.
no code implementations • CVPR 2022 • Moslem Yazdanpanah, Aamer Abdul Rahman, Muawiz Chaudhary, Christian Desrosiers, Mohammad Havaei, Eugene Belilovsky, Samira Ebrahimi Kahou
Batch Normalization is a staple of computer vision models, including those employed in few-shot learning.
no code implementations • 29 Sep 2021 • Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Sajjad Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet.
1 code implementation • CVPR 2022 • Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
The wavelet scattering transform creates geometric invariants and deformation stability.
no code implementations • 11 Jun 2021 • Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon
With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays.
no code implementations • 11 Jun 2021 • Mateusz Michalkiewicz, Stavros Tsogkas, Sarah Parisot, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky
The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space.
2 code implementations • 11 Apr 2021 • Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky
In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones.
1 code implementation • 19 Jan 2021 • Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon
A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis.
no code implementations • ICLR 2021 • Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon
A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis.
1 code implementation • ICCV 2021 • Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky
However, test images might contain zero- and few-shot compositions of objects and relationships, e. g. <cup, on, surfboard>.
1 code implementation • 17 May 2020 • Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky
We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA.
no code implementations • 10 May 2020 • Mateusz Michalkiewicz, Eugene Belilovsky, Mahsa Baktashmotlagh, Anders Eriksson
Deep learning applied to the reconstruction of 3D shapes has seen growing interest.
1 code implementation • ECCV 2020 • Mateusz Michalkiewicz, Sarah Parisot, Stavros Tsogkas, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky
In this work we demonstrate experimentally that naive baselines do not apply when the goal is to learn to reconstruct novel objects using very few examples, and that in a \emph{few-shot} learning setting, the network must learn concepts that can be applied to new categories, avoiding rote memorization.
2 code implementations • NeurIPS 2019 • Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, Lucas Page-Caccia
Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.
1 code implementation • ICML 2020 • Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau
We show how to use discrete auto-encoders to effectively address this challenge and introduce Adaptive Quantization Modules (AQM) to control variation in the compression ability of the module at any given stage of learning.
no code implementations • 25 Sep 2019 • Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau
We first replace the episodic memory used in Experience Replay with SQM, leading to significant gains on standard continual learning benchmarks using a fixed memory budget.
1 code implementation • 14 Aug 2019 • Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville
The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task.
1 code implementation • 11 Aug 2019 • Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars
Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.
2 code implementations • ICML 2020 • Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification.
1 code implementation • 29 Dec 2018 • Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.
2 code implementations • 28 Dec 2018 • Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Muawiz Chaudhary, Matthew J. Hirn, Edouard Oyallon, Sixin Zhang, Carmine Cella, Michael Eickenberg
The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications.
1 code implementation • 12 Nov 2018 • Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville
We explore blindfold (question-only) baselines for Embodied Question Answering.
no code implementations • 27 Sep 2018 • Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.
1 code implementation • ECCV 2018 • Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko
We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN).
1 code implementation • 17 Sep 2018 • Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky
In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs.
2 code implementations • ICCV 2017 • Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko
Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11. 4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers.
Ranked #72 on
Image Classification
on STL-10
no code implementations • 17 Nov 2016 • Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko
The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).
1 code implementation • ICML 2017 • Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko
Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.
no code implementations • NeurIPS 2016 • Eugene Belilovsky, Gaël Varoquaux, Matthew B. Blaschko
We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator.
1 code implementation • 14 Nov 2015 • Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton
Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.