Search Results for author: Joan Puigcerver

Found 20 papers, 11 papers with code

Sparse MoEs meet Efficient Ensembles

1 code implementation7 Oct 2021 James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models.

Few-Shot Learning

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation9 Dec 2022 Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

From Sparse to Soft Mixtures of Experts

4 code implementations2 Aug 2023 Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby

Sparse mixture of expert architectures (MoEs) scale model capacity without large increases in training or inference costs.

Which Model to Transfer? Finding the Needle in the Growing Haystack

no code implementations CVPR 2022 Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic

Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline.

Transfer Learning

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

no code implementations6 Jun 2022 Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

MoEs are a natural fit for a multimodal backbone, since expert layers can learn an appropriate partitioning of modalities.

Contrastive Learning

Sparsity-Constrained Optimal Transport

no code implementations30 Sep 2022 Tianlin Liu, Joan Puigcerver, Mathieu Blondel

The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.

On the Adversarial Robustness of Mixture of Experts

no code implementations19 Oct 2022 Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli

We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost.

Adversarial Robustness Open-Ended Question Answering

Routers in Vision Mixture of Experts: An Empirical Study

no code implementations29 Jan 2024 Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver

Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.