Search Results for author: Benjamin Thérien

Found 8 papers, 3 papers with code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation13 Mar 2024 Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

Can We Learn Communication-Efficient Optimizers?

no code implementations2 Dec 2023 Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning.

Language Modelling

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations8 Aug 2023 Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

Object Re-Identification from Point Clouds

no code implementations17 May 2023 Benjamin Thérien, Chengjie Huang, Adrian Chow, Krzysztof Czarnecki

To our knowledge, we are the first to study object re-identification from real point cloud observations.

3D Multi-Object Tracking Autonomous Driving +3

A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition

no code implementations5 Oct 2022 Luke Rowe, Benjamin Thérien, Krzysztof Czarnecki, Hongyang Zhang

In adversarial machine learning, the popular $\ell_\infty$ threat model has been the focus of much previous work.

Interpretable Deep Tracking

no code implementations3 Oct 2022 Benjamin Thérien, Krzysztof Czarnecki

By enumerating different tracking decisions and associated reasoning procedures, we can train individual networks to reason about the possible decisions via IIT.

Motion Forecasting Multi-Object Tracking

Exploring the Optimality of Tight-Frame Scattering Networks

no code implementations29 Sep 2021 Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Sajjad Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet.

Cannot find the paper you are looking for? You can Submit a new open access paper.