Search Results for author: Neil Houlsby

Found 55 papers, 31 papers with code

Scaling Laws for Sparsely-Connected Foundation Models

no code implementations15 Sep 2023 Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci

We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i. e., "foundation models"), in both vision and language domains.

From Sparse to Soft Mixtures of Experts

3 code implementations2 Aug 2023 Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby

Sparse mixture of expert architectures (MoEs) scale model capacity without large increases in training or inference costs.

Scaling Open-Vocabulary Object Detection

no code implementations NeurIPS 2023 Matthias Minderer, Alexey Gritsenko, Neil Houlsby

However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31. 2% to 44. 6% (43% relative improvement).

Image Classification Language Modelling +2

Dual PatchNorm

7 code implementations2 Feb 2023 Manoj Kumar, Mostafa Dehghani, Neil Houlsby

We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers.

Massively Scaling Heteroscedastic Classifiers

no code implementations30 Jan 2023 Mark Collier, Rodolphe Jenatton, Basil Mustafa, Neil Houlsby, Jesse Berent, Effrosyni Kokiopoulou

Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes.

Classification Contrastive Learning +1

Adaptive Computation with Elastic Input Sequence

1 code implementation30 Jan 2023 Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.

Inductive Bias

CLIPPO: Image-and-Language Understanding from Pixels Only

1 code implementation CVPR 2023 Michael Tschannen, Basil Mustafa, Neil Houlsby

Multimodal models are becoming increasingly effective, in part due to unified components, such as the Transformer architecture.

Contrastive Learning Image Classification +6

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation9 Dec 2022 Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

1 code implementation6 Jun 2022 Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

MoEs are a natural fit for a multimodal backbone, since expert layers can learn an appropriate partitioning of modalities.

Contrastive Learning

UL2: Unifying Language Learning Paradigms

1 code implementation10 May 2022 Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Information Retrieval Long-range modeling +4

Do better ImageNet classifiers assess perceptual similarity better?

no code implementations9 Mar 2022 Manoj Kumar, Neil Houlsby, Nal Kalchbrenner, Ekin D. Cubuk

Perceptual distances between images, as measured in the space of pre-trained deep features, have outperformed prior low-level, pixel-based metrics on assessing perceptual similarity.

Sparse MoEs meet Efficient Ensembles

1 code implementation7 Oct 2021 James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models.

Few-Shot Learning

The Benchmark Lottery

no code implementations14 Jul 2021 Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

Scaling Vision Transformers

1 code implementation CVPR 2022 Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer

As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90. 45% top-1 accuracy.

Ranked #3 on Image Classification on VTAB-1k (using extra training data)

Few-Shot Image Classification Few-Shot Learning

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

1 code implementation6 Apr 2021 Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB).

Few-Shot Learning General Classification +1

Training general representations for remote sensing using in-domain knowledge

no code implementations30 Sep 2020 Maxim Neumann, André Susano Pinto, Xiaohua Zhai, Neil Houlsby

Automatically finding good and general remote sensing representations allows to perform transfer learning on a wide range of applications - improving the accuracy and reducing the required number of training samples.

Representation Learning Transfer Learning

Automatic Shortcut Removal for Self-Supervised Representation Learning

no code implementations ICML 2020 Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen

In self-supervised visual representation learning, a feature extractor is trained on a "pretext task" for which labels can be generated cheaply, without human annotation.

Representation Learning

Self-Supervised Learning of Video-Induced Visual Invariances

no code implementations CVPR 2020 Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Xiaohua Zhai, Neil Houlsby, Sylvain Gelly, Mario Lucic

We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI).

Ranked #15 on Image Classification on VTAB-1k (using extra training data)

Image Classification Self-Supervised Learning +1

In-domain representation learning for remote sensing

1 code implementation15 Nov 2019 Maxim Neumann, Andre Susano Pinto, Xiaohua Zhai, Neil Houlsby

Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community.

 Ranked #1 on Multi-Label Image Classification on BigEarthNet (mAP (macro) metric)

Multi-Label Image Classification Representation Learning +1

Self-Supervised GANs via Auxiliary Rotation Loss

4 code implementations CVPR 2019 Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby

In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs.

Image Generation Representation Learning +1

Self-Supervised GAN to Counter Forgetting

no code implementations27 Oct 2018 Ting Chen, Xiaohua Zhai, Neil Houlsby

To counter forgetting, we encourage the discriminator to maintain useful representations by adding a self-supervision.

Continual Learning General Classification

Transfer Learning with Neural AutoML

no code implementations NeurIPS 2018 Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo

We extend RL-based architecture search methods to support parallel training on multiple tasks and then transfer the search strategy to new tasks.

General Classification Image Classification +2

Analyzing Language Learned by an Active Question Answering Agent

no code implementations23 Jan 2018 Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017].

Information Retrieval Question Answering +3

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

2 code implementations ICLR 2018 Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer.

Information Retrieval Question Answering +3

A Filtering Approach to Stochastic Variational Inference

no code implementations NeurIPS 2014 Neil Houlsby, David Blei

Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data.

Stochastic Optimization Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.