no code implementations • 26 May 2023 • Jannik Kossen, Mark Collier, Basil Mustafa, Xiao Wang, Xiaohua Zhai, Lucas Beyer, Andreas Steiner, Jesse Berent, Rodolphe Jenatton, Efi Kokiopoulou
With 3T, we propose a more flexible strategy that allows the image tower to benefit from both pretrained embeddings and contrastive training.
1 code implementation • 3 Mar 2023 • Guillermo Ortiz-Jimenez, Mark Collier, Anant Nawalgaria, Alexander D'Amour, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
Leveraging privileged information (PI), or features available during training but not at test time, has recently been shown to be an effective method for addressing label noise.
no code implementations • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on Linear-Probe Classification on ImageNet (using extra training data)
no code implementations • 30 Jan 2023 • Mark Collier, Rodolphe Jenatton, Basil Mustafa, Neil Houlsby, Jesse Berent, Effrosyni Kokiopoulou
Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes.
no code implementations • 19 Oct 2022 • Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli
We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost.
1 code implementation • 15 Jul 2022 • Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek, Balaji Lakshminarayanan
A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures.
1 code implementation • 6 Jun 2022 • Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby
MoEs are a natural fit for a multimodal backbone, since expert layers can learn an appropriate partitioning of modalities.
no code implementations • 18 Feb 2022 • Mark Collier, Rodolphe Jenatton, Efi Kokiopoulou, Jesse Berent
Supervised learning datasets often have privileged information, in the form of features which are available at training time but are not available at test time e. g. the ID of the annotator that provided the label.
no code implementations • 15 Dec 2021 • Setareh Ariafar, Justin Gilmer, Zack Nado, Jasper Snoek, Rodolphe Jenatton, George E. Dahl
For example, when tuning hyperparameters for machine learning pipelines on a new problem given a limited budget, one must strike a balance between excluding potentially promising regions and keeping the search space small enough to be tractable.
no code implementations • 7 Oct 2021 • James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton
Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance.
no code implementations • 6 Oct 2021 • Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications.
1 code implementation • NeurIPS 2021 • Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby
We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks.
Ranked #1 on Few-Shot Image Classification on ImageNet - 5-shot
2 code implementations • 7 Jun 2021 • Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal, Dustin Tran
In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks.
no code implementations • CVPR 2021 • Mark Collier, Basil Mustafa, Efi Kokiopoulou, Rodolphe Jenatton, Jesse Berent
We place a multivariate Normal distributed latent variable on the final hidden layer of a neural network classifier.
Ranked #5 on Image Classification on WebVision-1000
no code implementations • 15 Dec 2020 • Piali Das, Valerio Perrone, Nikita Ivkin, Tanya Bansal, Zohar Karnin, Huibin Shen, Iaroslav Shcherbatyi, Yotam Elor, Wilton Wu, Aida Zolic, Thibaut Lienart, Alex Tang, Amr Ahmed, Jean Baptiste Faddoul, Rodolphe Jenatton, Fela Winkelmolen, Philip Gautier, Leo Dirac, Andre Perunicic, Miroslav Miladinovic, Giovanni Zappella, Cédric Archambeau, Matthias Seeger, Bhaskar Dutt, Laurence Rouesnel
AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline.
no code implementations • 15 Dec 2020 • Valerio Perrone, Huibin Shen, Aida Zolic, Iaroslav Shcherbatyi, Amr Ahmed, Tanya Bansal, Michele Donini, Fela Winkelmolen, Rodolphe Jenatton, Jean Baptiste Faddoul, Barbara Pogorzelska, Miroslav Miladinovic, Krishnaram Kenthapadi, Matthias Seeger, Cédric Archambeau
To democratize access to machine learning systems, it is essential to automate the tuning.
no code implementations • pproximateinference AABI Symposium 2021 • Zelda E Mariet, Rodolphe Jenatton, Florian Wenzel, Dustin Tran
We seek to bridge the performance gap between batch ensembles (ensembles of deep networks with shared parameters) and deep ensembles on tasks which require not only predictions, but also uncertainty estimates for these predictions.
1 code implementation • ICLR 2021 • Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran
Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network.
2 code implementations • NeurIPS 2020 • Florian Wenzel, Jasper Snoek, Dustin Tran, Rodolphe Jenatton
Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration.
1 code implementation • 10 Jun 2020 • Luigi Carratino, Moustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert
We show that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data.
Ranked #74 on Image Classification on ObjectNet (using extra training data)
no code implementations • 15 Mar 2020 • Mark Collier, Basil Mustafa, Efi Kokiopoulou, Rodolphe Jenatton, Jesse Berent
By tuning the softmax temperature, we improve accuracy, log-likelihood and calibration on both image classification benchmarks with controlled label noise as well as Imagenet-21k which has naturally occurring label noise.
no code implementations • ICML 2020 • Jakub Swiatkowski, Kevin Roth, Bastiaan S. Veeling, Linh Tran, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights.
1 code implementation • ICML 2020 • Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD.
1 code implementation • 14 Jan 2020 • Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin, Rodolphe Jenatton
As a result, the diversity of the ensemble predictions, stemming from each member, is lost.
no code implementations • 15 Oct 2019 • Valerio Perrone, Iaroslav Shcherbatyi, Rodolphe Jenatton, Cedric Archambeau, Matthias Seeger
We propose constrained Max-value Entropy Search (cMES), a novel information theoretic-based acquisition function implementing this formulation.
no code implementations • NeurIPS 2019 • Valerio Perrone, Huibin Shen, Matthias Seeger, Cedric Archambeau, Rodolphe Jenatton
Despite its simplicity, we show that our approach considerably boosts BO by reducing the size of the search space, thus accelerating the optimization of a variety of black-box optimization problems.
no code implementations • 25 Sep 2019 • Jakub Świątkowski, Kevin Roth, Bastiaan S. Veeling, Linh Tran, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
Variational Bayesian Inference is a popular methodology for approximating posterior distributions in Bayesian neural networks.
no code implementations • NeurIPS 2018 • Valerio Perrone, Rodolphe Jenatton, Matthias W. Seeger, Cedric Archambeau
Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization.
no code implementations • 8 Dec 2017 • Valerio Perrone, Rodolphe Jenatton, Matthias Seeger, Cedric Archambeau
Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization.
no code implementations • ICML 2017 • Rodolphe Jenatton, Cedric Archambeau, Javier González, Matthias Seeger
The benefit of leveraging this structure is twofold: we explore the search space more efficiently and posterior inference scales more favorably with the number of observations than Gaussian Process-based approaches published in the literature.
no code implementations • 17 Feb 2016 • Rodolphe Jenatton, Jim Huang, Dominik Csiba, Cedric Archambeau
We consider online optimization in the 1-lookahead setting, where the objective does not decompose additively over the rounds of the online game.
no code implementations • 23 Dec 2015 • Rodolphe Jenatton, Jim Huang, Cédric Archambeau
We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints , which are constraints that need to be satisfied when accumulated over a finite number of rounds T , but can be violated in intermediate rounds.
no code implementations • 19 Jul 2014 • Rémi Gribonval, Rodolphe Jenatton, Francis Bach
A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary.
no code implementations • 20 Mar 2014 • Matthias Seibert, Martin Kleinsteuber, Rémi Gribonval, Rodolphe Jenatton, Francis Bach
The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function.
no code implementations • 13 Dec 2013 • Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, Matthias Seibert
Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection.
no code implementations • NeurIPS 2013 • Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre d'Aspremont
Seriation seeks to reconstruct a linear order between variables using unsorted similarity information.
no code implementations • NeurIPS 2012 • Rodolphe Jenatton, Nicolas L. Roux, Antoine Bordes, Guillaume R. Obozinski
While there is a large body of work focused on modeling these data, few considered modeling these multiple types of relationships jointly.
no code implementations • NeurIPS 2010 • Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski
Our algorithm scales up to millions of groups and variables, and opens up a whole new range of applications for structured sparse models.
no code implementations • 8 Sep 2009 • Rodolphe Jenatton, Guillaume Obozinski, Francis Bach
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes.