no code implementations • 22 May 2023 • Ibrahim Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, Lucas Beyer
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration.
1 code implementation • 8 Apr 2023 • Sathya Chitturi, Zhurun Ji, Alexander Petsch, Cheng Peng, Zhantao Chen, Rajan Plumley, Mike Dunne, Sougata Mardanya, Sugata Chowdhury, Hongwei Chen, Arun Bansil, Adrian Feiguin, Alexander Kolesnikov, Dharmalingam Prabhakaran, Stephen Hayden, Daniel Ratner, Chunjing Jia, Youssef Nashed, Joshua Turner
The observation and description of collective excitations in solids is a fundamental issue when seeking to understand the physics of a many-body system.
no code implementations • 30 Mar 2023 • Lucas Beyer, Bo Wan, Gagan Madan, Filip Pavetic, Andreas Steiner, Alexander Kolesnikov, André Susano Pinto, Emanuele Bugliarello, Xiao Wang, Qihang Yu, Liang-Chieh Chen, Xiaohua Zhai
A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well.
no code implementations • 27 Mar 2023 • Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer
We propose a simple pairwise sigmoid loss for image-text pre-training.
no code implementations • 16 Feb 2023 • André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai
Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models.
no code implementations • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on
Linear-Probe Classification
on ImageNet
(using extra training data)
3 code implementations • CVPR 2023 • Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic
Vision Transformers convert images to sequences by slicing them into patches.
no code implementations • 14 Sep 2022 • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.
Ranked #1 on
Image Captioning
on nocaps out-of-domain
1 code implementation • 20 May 2022 • Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks.
2 code implementations • 3 May 2022 • Lucas Beyer, Xiaohua Zhai, Alexander Kolesnikov
It is commonly accepted that the Vision Transformer model requires sophisticated regularization techniques to excel at ImageNet-1k scale data.
4 code implementations • CVPR 2022 • Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer
This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.
10 code implementations • 18 Jun 2021 • Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer
Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.
3 code implementations • CVPR 2022 • Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov
In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation.
Ranked #405 on
Image Classification
on ImageNet
1 code implementation • CVPR 2022 • Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer
As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90. 45% top-1 accuracy.
Ranked #3 on
Image Classification
on VTAB-1k
(using extra training data)
44 code implementations • NeurIPS 2021 • Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy
Convolutional Neural Networks (CNNs) are the go-to model for computer vision.
Ranked #18 on
Image Classification
on OmniBenchmark
1 code implementation • 9 Apr 2021 • Jessica Yung, Rob Romijnders, Alexander Kolesnikov, Lucas Beyer, Josip Djolonga, Neil Houlsby, Sylvain Gelly, Mario Lucic, Xiaohua Zhai
Before deploying machine learning models it is critical to assess their robustness.
127 code implementations • ICLR 2021 • Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
1 code implementation • CVPR 2021 • Josip Djolonga, Jessica Yung, Michael Tschannen, Rob Romijnders, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Matthias Minderer, Alexander D'Amour, Dan Moldovan, Sylvain Gelly, Neil Houlsby, Xiaohua Zhai, Mario Lucic
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
2 code implementations • 12 Jun 2020 • Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, Aäron van den Oord
Yes, and no.
8 code implementations • ECCV 2020 • Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby
We conduct detailed analysis of the main components that lead to high transfer performance.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
(using extra training data)
2 code implementations • arXiv 2020 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby
And, how close are we to general visual representations?
Ranked #9 on
Image Classification
on VTAB-1k
(using extra training data)
no code implementations • 25 Sep 2019 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby
Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets.
1 code implementation • ICCV 2019 • Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, Lucas Beyer
This work tackles the problem of semi-supervised learning of image classifiers.
Ranked #12 on
Semi-Supervised Image Classification
on ImageNet - 10% labeled data
(Top 5 Accuracy metric)
5 code implementations • CVPR 2019 • Alexander Kolesnikov, Xiaohua Zhai, Lucas Beyer
Unsupervised visual representation learning remains a largely unsolved problem in computer vision research.
Ranked #106 on
Self-Supervised Image Classification
on ImageNet
Representation Learning
Self-Supervised Image Classification
+1
1 code implementation • 2 Nov 2018 • Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, Vittorio Ferrari
We present Open Images V4, a dataset of 9. 2M images with unified annotations for image classification, object detection and visual relationship detection.
no code implementations • 5 Jul 2018 • Alexander Kolesnikov, Alina Kuznetsova, Christoph H. Lampert, Vittorio Ferrari
We propose a new model for detecting visual relationships, such as "person riding motorcycle" or "bottle on table".
1 code implementation • 11 May 2017 • Amelie Royer, Alexander Kolesnikov, Christoph H. Lampert
We develop a probabilistic technique for colorizing grayscale natural images.
no code implementations • ICML 2017 • Alexander Kolesnikov, Christoph H. Lampert
We study probabilistic models of natural images and extend the autoregressive family of PixelCNN architectures by incorporating auxiliary variables.
Ranked #13 on
Image Generation
on ImageNet 64x64
(Bits per dim metric)
9 code implementations • CVPR 2017 • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, Christoph H. Lampert
A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data.
Ranked #2 on
Incremental Learning
on ImageNet100 - 10 steps
(# M Params metric)
no code implementations • 18 May 2016 • Alexander Kolesnikov, Christoph H. Lampert
Weakly-supervised object localization methods tend to fail for object classes that consistently co-occur with the same background elements, e. g. trains on tracks.
2 code implementations • 19 Mar 2016 • Alexander Kolesnikov, Christoph H. Lampert
We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in an image, and to constrain the segmentations to coincide with object boundaries.
no code implementations • 28 Apr 2015 • Alexander Kolesnikov, Christoph H. Lampert
In this work, we present a Gaussian process (GP) based technique for simultaneously identifying which images of a training set have unreliable annotation and learning a segmentation model in which the negative effect of these images is suppressed.
no code implementations • 27 Mar 2014 • Alexander Kolesnikov, Matthieu Guillaumin, Vittorio Ferrari, Christoph H. Lampert
It is inspired by existing closed-form expressions for the maximum likelihood parameters of a generative graphical model with tree topology.