Search Results for author: Eugene Belilovsky

Found 50 papers, 29 papers with code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation • 13 Mar 2024 • Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

6,556

Paper
Code

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

no code implementations • 7 Feb 2024 • Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky

However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure.

Test-time Adaptation

Paper
Add Code

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

no code implementations • 11 Dec 2023 • MohammadReza Davari, Eugene Belilovsky

These breadcrumbs are constructed by subtracting the weights from a pre-trained model before and after fine-tuning, followed by a sparsification process that eliminates weight outliers and negligible perturbations.

Paper
Add Code

Can We Learn Communication-Efficient Optimizers?

no code implementations • 2 Dec 2023 • Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning.

Language Modelling

Paper
Add Code

DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

no code implementations • 6 Oct 2023 • Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa

Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline.

Paper
Add Code

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations • 8 Aug 2023 • Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

6,556

Paper
Code

$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning

1 code implementation • NeurIPS 2023 • Adel Nabli, Eugene Belilovsky, Edouard Oyallon

Distributed training of Deep Learning models has been critical to many recent successes in the field.

Paper
Code

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

no code implementations • 12 Jun 2023 • Geraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael Eickenberg, Eugene Belilovsky

These inputs can be selected from a data set or obtained by optimization.

Paper
Add Code

Can Forward Gradient Match Backpropagation?

1 code implementation • 12 Jun 2023 • Louis Fournier, Stéphane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements.

Memorization

Paper
Code

Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

no code implementations • 11 Apr 2023 • Gwen Legate, Lucas Caccia, Eugene Belilovsky

In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation.

Federated Learning

Paper
Add Code

Simulated Annealing in Early Layers Leads to Better Generalization

1 code implementation • CVPR 2023 • AmirMohammad Sarfi, Zahra Karimpour, Muawiz Chaudhary, Nasir M. Khalid, Mirco Ravanelli, Sudhir Mudur, Eugene Belilovsky

Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers.

Few-Shot Learning Transfer Learning

Paper
Code

Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning

1 code implementation • 26 Mar 2023 • Nader Asadi, MohammadReza Davari, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky

Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point.

Continual Learning Relation +1

Paper
Code

Imitation from Observation With Bootstrapped Contrastive Learning

no code implementations • 13 Feb 2023 • Medric Sonwa, Johanna Hansen, Eugene Belilovsky

In this paper, we adopt a challenging, but more realistic problem formulation, learning control policies that operate on a learned latent space with access only to visual demonstrations of an expert completing a task.

Contrastive Learning

Paper
Add Code

Local Learning with Neuron Groups

1 code implementation • 18 Jan 2023 • Adeetya Patel, Michael Eickenberg, Eugene Belilovsky

Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup and utilizes local objective functions to permit parallel learning amongst model components in a deep network.

Paper
Code

Reliability of CKA as a Similarity Measure in Deep Learning

no code implementations • 28 Oct 2022 • MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways.

Attribute

Paper
Add Code

Tackling Online One-Class Incremental Learning by Removing Negative Contrasts

no code implementations • 24 Mar 2022 • Nader Asadi, Sudhir Mudur, Eugene Belilovsky

Recent work studies the supervised online continual learning setting where a learner receives a stream of data whose class distribution changes over time.

Class Incremental Learning Contrastive Learning +2

Paper
Add Code

Probing Representation Forgetting in Supervised and Unsupervised Continual Learning

no code implementations • CVPR 2022 • MohammadReza Davari, Nader Asadi, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky

Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks.

Continual Learning Contrastive Learning

Paper
Add Code

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models

2 code implementations • 24 Mar 2022 • Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, Tiberiu Popa

We present a technique for zero-shot generation of a 3D model using only a target text prompt.

416

Paper
Code

New Insights on Reducing Abrupt Representation Change in Online Continual Learning

3 code implementations • ICLR 2022 • Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones.

Class Incremental Learning

1,657

Paper
Code

Towards Scaling Difference Target Propagation by Learning Backprop Targets

1 code implementation • 31 Jan 2022 • Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks.

Paper
Code

Gradient Masked Averaging for Federated Learning

no code implementations • 28 Jan 2022 • Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky

A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms.

Federated Learning Out-of-Distribution Generalization

Paper
Add Code

Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning

no code implementations • CVPR 2022 • Moslem Yazdanpanah, Aamer Abdul Rahman, Muawiz Chaudhary, Christian Desrosiers, Mohammad Havaei, Eugene Belilovsky, Samira Ebrahimi Kahou

Batch Normalization is a staple of computer vision models, including those employed in few-shot learning.

cross-domain few-shot learning Transfer Learning

Paper
Add Code

Exploring the Optimality of Tight-Frame Scattering Networks

no code implementations • 29 Sep 2021 • Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Sajjad Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet.

Paper
Add Code

Parametric Scattering Networks

1 code implementation • CVPR 2022 • Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

The wavelet scattering transform creates geometric invariants and deformation stability.

Ranked #3 on Small Data Image Classification on CIFAR-10, 500 Labels

Small Data Image Classification

Paper
Code

Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

no code implementations • 11 Jun 2021 • Mateusz Michalkiewicz, Stavros Tsogkas, Sarah Parisot, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space.

3D Reconstruction Few-Shot Learning +1

Paper
Add Code

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

no code implementations • 11 Jun 2021 • Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon

With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays.

Image Classification Quantization

Paper
Add Code

New Insights on Reducing Abrupt Representation Change in Online Continual Learning

3 code implementations • 11 Apr 2021 • Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

Continual Learning Metric Learning

459

Paper
Code

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

1 code implementation • 19 Jan 2021 • Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis.

Object Recognition Representation Learning

Paper
Code

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods.

no code implementations • ICLR 2021 • Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

Object Recognition Representation Learning

Paper
Add Code

Generative Compositional Augmentations for Scene Graph Prediction

1 code implementation • ICCV 2021 • Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

However, test images might contain zero- and few-shot compositions of objects and relationships, e. g. <cup, on, surfboard>.

Graph Generation Language Modelling +1

122

Paper
Code

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

1 code implementation • 17 May 2020 • Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA.

Graph Generation Scene Graph Generation

122

Paper
Code

A Simple and Scalable Shape Representation for 3D Reconstruction

no code implementations • 10 May 2020 • Mateusz Michalkiewicz, Eugene Belilovsky, Mahsa Baktashmotlagh, Anders Eriksson

Deep learning applied to the reconstruction of 3D shapes has seen growing interest.

3D Reconstruction

Paper
Add Code

Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors

1 code implementation • ECCV 2020 • Mateusz Michalkiewicz, Sarah Parisot, Stavros Tsogkas, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

In this work we demonstrate experimentally that naive baselines do not apply when the goal is to learn to reconstruct novel objects using very few examples, and that in a \emph{few-shot} learning setting, the network must learn concepts that can be applied to new categories, avoiding rote memorization.

3D Reconstruction Few-Shot Learning +3

Paper
Code

Online Continual Learning with Maximal Interfered Retrieval

2 code implementations • NeurIPS 2019 • Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, Lucas Page-Caccia

Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.

Class Incremental Learning Retrieval

1,657

Paper
Code

Online Learned Continual Compression with Adaptive Quantization Modules

1 code implementation • ICML 2020 • Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

We show how to use discrete auto-encoders to effectively address this challenge and introduce Adaptive Quantization Modules (AQM) to control variation in the compression ability of the module at any given stage of learning.

Continual Learning Quantization

Paper
Code

Online Learned Continual Compression with Stacked Quantization Modules

no code implementations • 25 Sep 2019 • Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

We first replace the episodic memory used in Experience Replay with SQM, leading to significant gains on standard continual learning benchmarks using a fixed memory budget.

Continual Learning Quantization

Paper
Add Code

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

1 code implementation • 14 Aug 2019 • Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville

The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task.

Embodied Question Answering Question Answering +1

Paper
Code

Online Continual Learning with Maximally Interfered Retrieval

1 code implementation • 11 Aug 2019 • Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars

Continual Learning Retrieval

Paper
Code

Decoupled Greedy Learning of CNNs

2 code implementations • ICML 2020 • Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification.

Image Classification

Paper
Code

Greedy Layerwise Learning Can Scale to ImageNet

1 code implementation • 29 Dec 2018 • Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.

Image Classification

Paper
Code

Kymatio: Scattering Transforms in Python

2 code implementations • 28 Dec 2018 • Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Muawiz Chaudhary, Matthew J. Hirn, Edouard Oyallon, Sixin Zhang, Carmine Cella, Michael Eickenberg

The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications.

BIG-bench Machine Learning

724

Paper
Code

Blindfold Baselines for Embodied QA

1 code implementation • 12 Nov 2018 • Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville

We explore blindfold (question-only) baselines for Embodied Question Answering.

Embodied Question Answering Question Answering

Paper
Code

Shallow Learning For Deep Networks

no code implementations • 27 Sep 2018 • Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.

Paper
Add Code

Compressing the Input for CNNs with the First-Order Scattering Transform

1 code implementation • ECCV 2018 • Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN).

General Classification Translation

Paper
Code

Scattering Networks for Hybrid Representation Learning

1 code implementation • 17 Sep 2018 • Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs.

Representation Learning

Paper
Code

Scaling the Scattering Transform: Deep Hybrid Networks

2 code implementations • ICCV 2017 • Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko

Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11. 4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers.

Ranked #73 on Image Classification on STL-10

Image Classification

296

Paper
Code

Fast Non-Parametric Tests of Relative Dependency and Similarity

no code implementations • 17 Nov 2016 • Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko

The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).

Paper
Add Code

Learning to Discover Sparse Graphical Models

1 code implementation • ICML 2017 • Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

Paper
Code

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

no code implementations • NeurIPS 2016 • Eugene Belilovsky, Gaël Varoquaux, Matthew B. Blaschko

We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator.

Paper
Add Code

A Test of Relative Similarity For Model Selection in Generative Models

1 code implementation • 14 Nov 2015 • Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton

Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.

Model Selection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.