Search Results for author: Alexander Kolesnikov

Found 35 papers, 29 papers with code

Closed-Form Training of Conditional Random Fields for Large Scale Image Segmentation

no code implementations • 27 Mar 2014 • Alexander Kolesnikov, Matthieu Guillaumin, Vittorio Ferrari, Christoph H. Lampert

It is inspired by existing closed-form expressions for the maximum likelihood parameters of a generative graphical model with tree topology.

Image Segmentation Segmentation +1

Paper
Add Code

Identifying Reliable Annotations for Large Scale Image Segmentation

no code implementations • 28 Apr 2015 • Alexander Kolesnikov, Christoph H. Lampert

In this work, we present a Gaussian process (GP) based technique for simultaneously identifying which images of a training set have unreliable annotation and learning a segmentation model in which the negative effect of these images is suppressed.

Image Segmentation Segmentation +1

Paper
Add Code

Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation

2 code implementations • 19 Mar 2016 • Alexander Kolesnikov, Christoph H. Lampert

We introduce a new loss function for the weakly-supervised training of semantic image segmentation models based on three guiding principles: to seed with weak localization cues, to expand objects based on the information about which classes can occur in an image, and to constrain the segmentations to coincide with object boundaries.

Image Segmentation Segmentation +1

241

Paper
Code

Improving Weakly-Supervised Object Localization By Micro-Annotation

no code implementations • 18 May 2016 • Alexander Kolesnikov, Christoph H. Lampert

Weakly-supervised object localization methods tend to fail for object classes that consistently co-occur with the same background elements, e. g. trains on tracks.

Object Semantic Segmentation +1

Paper
Add Code

iCaRL: Incremental Classifier and Representation Learning

9 code implementations • CVPR 2017 • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, Christoph H. Lampert

A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data.

Ranked #2 on Incremental Learning on ImageNet100 - 10 steps (# M Params metric)

Class Incremental Learning Incremental Learning +1

1,657

Paper
Code

PixelCNN Models with Auxiliary Variables for Natural Image Modeling

no code implementations • ICML 2017 • Alexander Kolesnikov, Christoph H. Lampert

We study probabilistic models of natural images and extend the autoregressive family of PixelCNN architectures by incorporating auxiliary variables.

Ranked #13 on Image Generation on ImageNet 64x64 (Bits per dim metric)

Image Generation

Paper
Add Code

Probabilistic Image Colorization

1 code implementation • 11 May 2017 • Amelie Royer, Alexander Kolesnikov, Christoph H. Lampert

We develop a probabilistic technique for colorizing grayscale natural images.

Colorization Image Colorization

Paper
Code

Detecting Visual Relationships Using Box Attention

no code implementations • 5 Jul 2018 • Alexander Kolesnikov, Alina Kuznetsova, Christoph H. Lampert, Vittorio Ferrari

We propose a new model for detecting visual relationships, such as "person riding motorcycle" or "bottle on table".

object-detection Object Detection

Paper
Add Code

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

1 code implementation • 2 Nov 2018 • Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, Vittorio Ferrari

We present Open Images V4, a dataset of 9. 2M images with unified annotations for image classification, object detection and visual relationship detection.

General Classification Image Classification +5

Paper
Code

Revisiting Self-Supervised Visual Representation Learning

6 code implementations • CVPR 2019 • Alexander Kolesnikov, Xiaohua Zhai, Lucas Beyer

Unsupervised visual representation learning remains a largely unsolved problem in computer vision research.

Ranked #119 on Self-Supervised Image Classification on ImageNet

Representation Learning Self-Supervised Image Classification +1

394

Paper
Code

S4L: Self-Supervised Semi-Supervised Learning

1 code implementation • ICCV 2019 • Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, Lucas Beyer

This work tackles the problem of semi-supervised learning of image classifiers.

Ranked #15 on Semi-Supervised Image Classification on ImageNet - 10% labeled data (Top 5 Accuracy metric)

General Classification Representation Learning +1

Paper
Code

The Visual Task Adaptation Benchmark

no code implementations • 25 Sep 2019 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets.

Representation Learning

Paper
Add Code

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

2 code implementations • arXiv 2020 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

And, how close are we to general visual representations?

Ranked #10 on Image Classification on VTAB-1k (using extra training data)

Image Classification Representation Learning

3,225

Paper
Code

Big Transfer (BiT): General Visual Representation Learning

8 code implementations • ECCV 2020 • Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby

We conduct detailed analysis of the main components that lead to high transfer performance.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)

Few-Shot Learning Fine-Grained Image Classification +2

1,490

Paper
Code

Are we done with ImageNet?

2 code implementations • 12 Jun 2020 • Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, Aäron van den Oord

Yes, and no.

180

Paper
Code

On Robustness and Transferability of Convolutional Neural Networks

1 code implementation • CVPR 2021 • Josip Djolonga, Jessica Yung, Michael Tschannen, Rob Romijnders, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Matthias Minderer, Alexander D'Amour, Dan Moldovan, Sylvain Gelly, Neil Houlsby, Xiaohua Zhai, Mario Lucic

Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.

Image Classification Transfer Learning

Paper
Code

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

143 code implementations • ICLR 2021 • Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

Ranked #1 on Image Classification on CIFAR-10

Classification Document Image Classification +6

124,353

Paper
Code

SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size

1 code implementation • 9 Apr 2021 • Jessica Yung, Rob Romijnders, Alexander Kolesnikov, Lucas Beyer, Josip Djolonga, Neil Houlsby, Sylvain Gelly, Mario Lucic, Xiaohua Zhai

Before deploying machine learning models it is critical to assess their robustness.

BIG-bench Machine Learning

Paper
Code

MLP-Mixer: An all-MLP Architecture for Vision

46 code implementations • NeurIPS 2021 • Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Convolutional Neural Networks (CNNs) are the go-to model for computer vision.

Ranked #17 on Image Classification on OmniBenchmark

Image Classification

47,331

Paper
Code

Scaling Vision Transformers

1 code implementation • CVPR 2022 • Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer

As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90. 45% top-1 accuracy.

Ranked #3 on Image Classification on VTAB-1k (using extra training data)

Few-Shot Image Classification Few-Shot Learning

1,535

Paper
Code

Knowledge distillation: A good teacher is patient and consistent

3 code implementations • CVPR 2022 • Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation.

Ranked #450 on Image Classification on ImageNet

Image Classification Knowledge Distillation

1,535

Paper
Code

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

15 code implementations • 18 Jun 2021 • Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, Lucas Beyer

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.

Data Augmentation Image Classification +5

29,624

Paper
Code

LiT: Zero-Shot Transfer with Locked-image text Tuning

4 code implementations • CVPR 2022 • Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.

Ranked #1 on Zero-Shot Transfer Image Classification on ImageNet ReaL

Image Classification Retrieval +2

9,201

Paper
Code

Better plain ViT baselines for ImageNet-1k

5 code implementations • 3 May 2022 • Lucas Beyer, Xiaohua Zhai, Alexander Kolesnikov

It is commonly accepted that the Vision Transformer model requires sophisticated regularization techniques to excel at ImageNet-1k scale data.

Data Augmentation Image Classification

1,535

Paper
Code

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

1 code implementation • 20 May 2022 • Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby

We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks.

Colorization Depth Estimation +4

1,535

Paper
Code

PaLI: A Jointly-Scaled Multilingual Language-Image Model

1 code implementation • 14 Sep 2022 • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.

Ranked #1 on Zero-Shot Transfer Image Classification on ImageNet-S

Few-Shot Image Classification Image Captioning +5

1,535

Paper
Code

FlexiViT: One Model for All Patch Sizes

4 code implementations • CVPR 2023 • Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic

Vision Transformers convert images to sequences by slicing them into patches.

Panoptic Segmentation Retrieval +1

29,624

Paper
Code

Scaling Vision Transformers to 22 Billion Parameters

1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

The scaling of Transformers has driven breakthrough capabilities for language models.

Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet

Action Classification Fairness +3

192

Paper
Code

Tuning computer vision models with task rewards

1 code implementation • 16 Feb 2023 • André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai

Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models.

Colorization Image Captioning +5

1,535

Paper
Code

Sigmoid Loss for Language Image Pre-Training

8 code implementations • ICCV 2023 • Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer

We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP).

Contrastive Learning Disentanglement

124,353

Paper
Code

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

1 code implementation • 30 Mar 2023 • Lucas Beyer, Bo Wan, Gagan Madan, Filip Pavetic, Andreas Steiner, Alexander Kolesnikov, André Susano Pinto, Emanuele Bugliarello, Xiao Wang, Qihang Yu, Liang-Chieh Chen, Xiaohua Zhai

A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well.

Multi-Task Learning Optical Character Recognition +3

1,535

Paper
Code

Capturing dynamical correlations using implicit neural representations

1 code implementation • 8 Apr 2023 • Sathya Chitturi, Zhurun Ji, Alexander Petsch, Cheng Peng, Zhantao Chen, Rajan Plumley, Mike Dunne, Sougata Mardanya, Sugata Chowdhury, Hongwei Chen, Arun Bansil, Adrian Feiguin, Alexander Kolesnikov, Dharmalingam Prabhakaran, Stephen Hayden, Daniel Ratner, Chunjing Jia, Youssef Nashed, Joshua Turner

The observation and description of collective excitations in solids is a fundamental issue when seeking to understand the physics of a many-body system.

Paper
Code

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

1 code implementation • NeurIPS 2023 • Ibrahim Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, Lucas Beyer

Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration.

Image Classification Visual Question Answering (VQA)

1,535

Paper
Code

PaLI-X: On Scaling up a Multilingual Vision and Language Model

2 code implementations • 29 May 2023 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture.

Ranked #1 on Fine-Grained Image Recognition on OVEN

Chart Question Answering document understanding +9

Paper
Code

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

1 code implementation • 13 Oct 2023 • Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger.

Ranked #2 on Temporal/Casual QA on NExT-QA (using extra training data)

Chart Question Answering Image Classification +4

115

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.