Search Results for author: Eugene Belilovsky

Found 50 papers, 29 papers with code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation13 Mar 2024 Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

no code implementations7 Feb 2024 Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky

However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure.

Test-time Adaptation

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

no code implementations11 Dec 2023 MohammadReza Davari, Eugene Belilovsky

These breadcrumbs are constructed by subtracting the weights from a pre-trained model before and after fine-tuning, followed by a sparsification process that eliminates weight outliers and negligible perturbations.

Can We Learn Communication-Efficient Optimizers?

no code implementations2 Dec 2023 Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning.

Language Modelling

DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

no code implementations6 Oct 2023 Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa

Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline.

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations8 Aug 2023 Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

Can Forward Gradient Match Backpropagation?

1 code implementation12 Jun 2023 Louis Fournier, Stéphane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements.

Memorization

Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

no code implementations11 Apr 2023 Gwen Legate, Lucas Caccia, Eugene Belilovsky

In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation.

Federated Learning

Simulated Annealing in Early Layers Leads to Better Generalization

1 code implementation CVPR 2023 AmirMohammad Sarfi, Zahra Karimpour, Muawiz Chaudhary, Nasir M. Khalid, Mirco Ravanelli, Sudhir Mudur, Eugene Belilovsky

Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers.

Few-Shot Learning Transfer Learning

Imitation from Observation With Bootstrapped Contrastive Learning

no code implementations13 Feb 2023 Medric Sonwa, Johanna Hansen, Eugene Belilovsky

In this paper, we adopt a challenging, but more realistic problem formulation, learning control policies that operate on a learned latent space with access only to visual demonstrations of an expert completing a task.

Contrastive Learning

Local Learning with Neuron Groups

1 code implementation18 Jan 2023 Adeetya Patel, Michael Eickenberg, Eugene Belilovsky

Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup and utilizes local objective functions to permit parallel learning amongst model components in a deep network.

Reliability of CKA as a Similarity Measure in Deep Learning

no code implementations28 Oct 2022 MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways.

Attribute

Tackling Online One-Class Incremental Learning by Removing Negative Contrasts

no code implementations24 Mar 2022 Nader Asadi, Sudhir Mudur, Eugene Belilovsky

Recent work studies the supervised online continual learning setting where a learner receives a stream of data whose class distribution changes over time.

Class Incremental Learning Contrastive Learning +2

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models

2 code implementations24 Mar 2022 Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, Tiberiu Popa

We present a technique for zero-shot generation of a 3D model using only a target text prompt.

New Insights on Reducing Abrupt Representation Change in Online Continual Learning

3 code implementations ICLR 2022 Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones.

Class Incremental Learning

Towards Scaling Difference Target Propagation by Learning Backprop Targets

1 code implementation31 Jan 2022 Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks.

Gradient Masked Averaging for Federated Learning

no code implementations28 Jan 2022 Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky

A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms.

Federated Learning Out-of-Distribution Generalization

Exploring the Optimality of Tight-Frame Scattering Networks

no code implementations29 Sep 2021 Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Sajjad Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet.

Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

no code implementations11 Jun 2021 Mateusz Michalkiewicz, Stavros Tsogkas, Sarah Parisot, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space.

3D Reconstruction Few-Shot Learning +1

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

no code implementations11 Jun 2021 Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon

With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays.

Image Classification Quantization

New Insights on Reducing Abrupt Representation Change in Online Continual Learning

3 code implementations11 Apr 2021 Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones.

Continual Learning Metric Learning

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

1 code implementation19 Jan 2021 Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis.

Object Recognition Representation Learning

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods.

no code implementations ICLR 2021 Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis.

Object Recognition Representation Learning

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

1 code implementation17 May 2020 Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA.

Graph Generation Scene Graph Generation

Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors

1 code implementation ECCV 2020 Mateusz Michalkiewicz, Sarah Parisot, Stavros Tsogkas, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

In this work we demonstrate experimentally that naive baselines do not apply when the goal is to learn to reconstruct novel objects using very few examples, and that in a \emph{few-shot} learning setting, the network must learn concepts that can be applied to new categories, avoiding rote memorization.

3D Reconstruction Few-Shot Learning +3

Online Continual Learning with Maximal Interfered Retrieval

2 code implementations NeurIPS 2019 Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, Lucas Page-Caccia

Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.

Class Incremental Learning Retrieval

Online Learned Continual Compression with Adaptive Quantization Modules

1 code implementation ICML 2020 Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

We show how to use discrete auto-encoders to effectively address this challenge and introduce Adaptive Quantization Modules (AQM) to control variation in the compression ability of the module at any given stage of learning.

Continual Learning Quantization

Online Learned Continual Compression with Stacked Quantization Modules

no code implementations25 Sep 2019 Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

We first replace the episodic memory used in Experience Replay with SQM, leading to significant gains on standard continual learning benchmarks using a fixed memory budget.

Continual Learning Quantization

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

1 code implementation14 Aug 2019 Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville

The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task.

Embodied Question Answering Question Answering +1

Online Continual Learning with Maximally Interfered Retrieval

1 code implementation11 Aug 2019 Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars

Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.

Continual Learning Retrieval

Decoupled Greedy Learning of CNNs

2 code implementations ICML 2020 Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification.

Image Classification

Greedy Layerwise Learning Can Scale to ImageNet

1 code implementation29 Dec 2018 Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.

Image Classification

Shallow Learning For Deep Networks

no code implementations27 Sep 2018 Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.

Compressing the Input for CNNs with the First-Order Scattering Transform

1 code implementation ECCV 2018 Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN).

General Classification Translation

Scattering Networks for Hybrid Representation Learning

1 code implementation17 Sep 2018 Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs.

Representation Learning

Scaling the Scattering Transform: Deep Hybrid Networks

2 code implementations ICCV 2017 Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko

Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11. 4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers.

Image Classification

Fast Non-Parametric Tests of Relative Dependency and Similarity

no code implementations17 Nov 2016 Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko

The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).

Learning to Discover Sparse Graphical Models

1 code implementation ICML 2017 Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

no code implementations NeurIPS 2016 Eugene Belilovsky, Gaël Varoquaux, Matthew B. Blaschko

We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator.

A Test of Relative Similarity For Model Selection in Generative Models

1 code implementation14 Nov 2015 Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton

Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.

Model Selection

Cannot find the paper you are looking for? You can Submit a new open access paper.