Search Results for author: Carlos Riquelme

Found 24 papers, 11 papers with code

Stable Code Technical Report

no code implementations • 1 Apr 2024 • Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, Nathan Cooper

Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models.

Code Completion Language Modelling +2

Paper
Add Code

Stable LM 2 1.6B Technical Report

no code implementations • 27 Feb 2024 • Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme

We introduce StableLM 2 1. 6B, the first in a new generation of our language model series.

Language Modelling

Paper
Add Code

Routers in Vision Mixture of Experts: An Empirical Study

no code implementations • 29 Jan 2024 • Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver

Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert.

Language Modelling

Paper
Add Code

Scaling Laws for Sparsely-Connected Foundation Models

no code implementations • 15 Sep 2023 • Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci

We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i. e., "foundation models"), in both vision and language domains.

Computational Efficiency

Paper
Add Code

From Sparse to Soft Mixtures of Experts

4 code implementations • 2 Aug 2023 • Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby

Sparse mixture of expert architectures (MoEs) scale model capacity without large increases in training or inference costs.

503

Paper
Code

Scaling Vision Transformers to 22 Billion Parameters

1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

The scaling of Transformers has driven breakthrough capabilities for language models.

Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet

Action Classification Fairness +3

192

Paper
Code

On the Adversarial Robustness of Mixture of Experts

no code implementations • 19 Oct 2022 • Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli

We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost.

Adversarial Robustness Open-Ended Question Answering

Paper
Add Code

PaLI: A Jointly-Scaled Multilingual Language-Image Model

1 code implementation • 14 Sep 2022 • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.

Ranked #1 on Zero-Shot Transfer Image Classification on ImageNet-S

Few-Shot Image Classification Image Captioning +5

1,537

Paper
Code

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

no code implementations • 6 Jun 2022 • Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

MoEs are a natural fit for a multimodal backbone, since expert layers can learn an appropriate partitioning of modalities.

Contrastive Learning

Paper
Add Code

Learning to Merge Tokens in Vision Transformers

1 code implementation • 24 Feb 2022 • Cedric Renggli, André Susano Pinto, Neil Houlsby, Basil Mustafa, Joan Puigcerver, Carlos Riquelme

Transformers are widely applied to solve natural language understanding and computer vision tasks.

Natural Language Understanding

Paper
Code

Scaling Vision with Sparse Mixture of Experts

1 code implementation • NeurIPS 2021 • Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby

We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks.

Ranked #1 on Few-Shot Image Classification on ImageNet - 5-shot

Few-Shot Image Classification

503

Paper
Code

Deep Ensembles for Low-Data Transfer Learning

no code implementations • 14 Oct 2020 • Basil Mustafa, Carlos Riquelme, Joan Puigcerver, André Susano Pinto, Daniel Keysers, Neil Houlsby

In the low-data regime, it is difficult to train good supervised models from scratch.

Ranked #6 on Image Classification on VTAB-1k (using extra training data)

Image Classification Transfer Learning

Paper
Add Code

Which Model to Transfer? Finding the Needle in the Growing Haystack

no code implementations • CVPR 2022 • Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic

Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline.

Transfer Learning

Paper
Add Code

Scalable Transfer Learning with Expert Models

no code implementations • ICLR 2021 • Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

We explore the use of expert representations for transfer with a simple, yet effective, strategy.

Ranked #11 on Image Classification on VTAB-1k (using extra training data)

Image Classification Transfer Learning

Paper
Add Code

On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation

1 code implementation • 22 Jan 2020 • Nicolas Brosse, Carlos Riquelme, Alice Martin, Sylvain Gelly, Éric Moulines

Uncertainty quantification for deep learning is a challenging open problem.

General Classification Representation Learning +1

Paper
Code

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

2 code implementations • arXiv 2020 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

And, how close are we to general visual representations?

Ranked #10 on Image Classification on VTAB-1k (using extra training data)

Image Classification Representation Learning

3,226

Paper
Code

The Visual Task Adaptation Benchmark

no code implementations • 25 Sep 2019 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets.

Representation Learning

Paper
Add Code

Google Research Football: A Novel Reinforcement Learning Environment

1 code implementation • 25 Jul 2019 • Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner.

Game of Football reinforcement-learning +1

3,246

Paper
Code

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

no code implementations • NeurIPS 2019 • Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.

Paper
Add Code

Practical and Consistent Estimation of f-Divergences

1 code implementation • NeurIPS 2019 • Paul K. Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin

The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning.

BIG-bench Machine Learning Mutual Information Estimation +1

32,732

Paper
Code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

4 code implementations • ICLR 2018 • Carlos Riquelme, George Tucker, Jasper Snoek

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

Ranked #1 on Multi-Armed Bandits on Mushroom

Decision Making Multi-Armed Bandits +3

76,571

Paper
Code

Active Learning for Accurate Estimation of Linear Models

no code implementations • ICML 2017 • Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric

We explore the sequential decision making problem where the goal is to estimate uniformly well a number of linear models, given a shared budget of random contexts independently sampled from a known distribution.

Active Learning Decision Making

Paper
Add Code

Human Interaction with Recommendation Systems

1 code implementation • 1 Mar 2017 • Sven Schmit, Carlos Riquelme

Based on this model, we prove that naive estimators, i. e. those which ignore this feedback loop, are not consistent.

Recommendation Systems

Paper
Code

Online Active Linear Regression via Thresholding

no code implementations • 9 Feb 2016 • Carlos Riquelme, Ramesh Johari, Baosen Zhang

We consider the problem of online active learning to collect data for regression modeling.

Active Learning regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.