Search Results for author: Xiaohua Zhai

Found 30 papers, 21 papers with code

Better plain ViT baselines for ImageNet-1k

2 code implementations3 May 2022 Lucas Beyer, Xiaohua Zhai, Alexander Kolesnikov

It is commonly accepted that the Vision Transformer model requires sophisticated regularization techniques to excel at ImageNet-1k scale data.

Data Augmentation Image Classification

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

no code implementations17 Dec 2021 Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Image Classification Instance Segmentation +5

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

8 code implementations18 Jun 2021 Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.

Data Augmentation Image Classification +4

Knowledge distillation: A good teacher is patient and consistent

3 code implementations CVPR 2022 Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation.

Knowledge Distillation

Scaling Vision Transformers

1 code implementation CVPR 2022 Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer

As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90. 45% top-1 accuracy.

Ranked #3 on Image Classification on VTAB-1k (using extra training data)

Few-Shot Image Classification

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

1 code implementation6 Apr 2021 Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB).

Few-Shot Learning General Classification +1

Training general representations for remote sensing using in-domain knowledge

no code implementations30 Sep 2020 Maxim Neumann, André Susano Pinto, Xiaohua Zhai, Neil Houlsby

Automatically finding good and general remote sensing representations allows to perform transfer learning on a wide range of applications - improving the accuracy and reducing the required number of training samples.

Representation Learning Transfer Learning

Self-Supervised Learning of Video-Induced Visual Invariances

no code implementations CVPR 2020 Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Xiaohua Zhai, Neil Houlsby, Sylvain Gelly, Mario Lucic

We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI).

Ranked #14 on Image Classification on VTAB-1k (using extra training data)

Image Classification Self-Supervised Learning +1

In-domain representation learning for remote sensing

no code implementations15 Nov 2019 Maxim Neumann, Andre Susano Pinto, Xiaohua Zhai, Neil Houlsby

Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community.

Image Classification Representation Learning

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

no code implementations ICLR 2019 Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion.

Self-Supervised GANs via Auxiliary Rotation Loss

4 code implementations CVPR 2019 Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby

In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs.

Image Generation Representation Learning

Self-Supervised GAN to Counter Forgetting

no code implementations27 Oct 2018 Ting Chen, Xiaohua Zhai, Neil Houlsby

To counter forgetting, we encourage the discriminator to maintain useful representations by adding a self-supervision.

Continual Learning General Classification

A Large-Scale Study on Regularization and Normalization in GANs

5 code implementations ICLR 2019 Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion.

Cannot find the paper you are looking for? You can Submit a new open access paper.