no code implementations • 12 Feb 2025 • Dan Busbridge, Amitis Shidani, Floris Weers, Jason Ramapuram, Etai Littwin, Russ Webb
We provide a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher.
1 code implementation • 6 Sep 2024 • Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, Russ Webb
Attention is a key part of the transformer architecture.
no code implementations • 8 Mar 2024 • Amitis Shidani, Devon Hjelm, Jason Ramapuram, Russ Webb, Eeshan Gunesh Dhekane, Dan Busbridge
Contrastive learning typically matches pairs of related views among a number of unrelated negative views.
no code implementations • 15 Dec 2023 • Amitis Shidani, Sattar Vakili
We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller.
1 code implementation • 30 Jun 2022 • Amitis Shidani, George Deligiannidis, Arnaud Doucet
We study the ranking problem in generalized linear bandits.
no code implementations • 2 Mar 2022 • Eugenio Clerico, Amitis Shidani, George Deligiannidis, Arnaud Doucet
This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.