no code implementations • 19 Dec 2024 • Alexander Kolesnikov, André Susano Pinto, Michael Tschannen
In the past, normalizing generative flows have emerged as a promising class of generative models for natural images.
no code implementations • 4 Dec 2024 • Andreas Steiner, André Susano Pinto, Michael Tschannen, Daniel Keysers, Xiao Wang, Yonatan Bitton, Alexey Gritsenko, Matthias Minderer, Anthony Sherbondy, Shangbang Long, Siyang Qin, Reeve Ingle, Emanuele Bugliarello, Sahar Kazemzadeh, Thomas Mesnard, Ibrahim Alabdulmohsin, Lucas Beyer, Xiaohua Zhai
PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models.
no code implementations • 29 Nov 2024 • Michael Tschannen, André Susano Pinto, Alexander Kolesnikov
We propose an autoregressive decoder-only transformer - JetFormer - which is trained to directly maximize the likelihood of raw data, without relying on any separately pretrained components, and can understand and generate both text and images.
1 code implementation • 10 Jul 2024 • Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer, Paul Voigtlaender, Ioana Bica, Ivana Balazevic, Joan Puigcerver, Pinelopi Papalampidi, Olivier Henaff, Xi Xiong, Radu Soricut, Jeremiah Harmsen, Xiaohua Zhai
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model.
1 code implementation • 28 Mar 2024 • Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai
In this paper, we propose a simple visual pretraining method with location-aware captioners (LocCa).
no code implementations • 3 Jan 2024 • Aleksandar Stanić, Sergi Caelles, Michael Tschannen
Recently, these models achieved great performance on tasks such as compositional visual question answering, visual grounding, and video temporal reasoning.
no code implementations • CVPR 2024 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
We explore the boundaries of scaling up a multilingual vision and language model both in terms of size of the components and the breadth of its training task mixture.
1 code implementation • 4 Dec 2023 • Michael Tschannen, Cian Eastwood, Fabian Mentzer
We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate vector sequences with real-valued entries, instead of discrete tokens from a finite vocabulary.
Ranked #35 on
Image Generation
on ImageNet 512x512
3 code implementations • 27 Sep 2023 • Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen
Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets.
2 code implementations • NeurIPS 2023 • Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer
We further analyze the effect of the model architecture and scale, as well as the pretraining data on the representation quality, and find that captioning exhibits the same or better scaling behavior along these axes.
2 code implementations • 29 May 2023 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture.
Ranked #1 on
Fine-Grained Image Recognition
on OVEN
no code implementations • ICCV 2023 • Fabian Mentzer, Eirikur Agustsson, Michael Tschannen
We show how bidirectional transformers trained for masked token prediction can be applied to neural image compression to achieve state-of-the-art results.
1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on
Zero-Shot Transfer Image Classification
on ObjectNet
1 code implementation • CVPR 2023 • Michael Tschannen, Basil Mustafa, Neil Houlsby
Multimodal models are becoming increasingly effective, in part due to unified components, such as the Transformer architecture.
4 code implementations • CVPR 2023 • Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic
Vision Transformers convert images to sequences by slicing them into patches.
no code implementations • 29 Mar 2022 • Anna Volokitin, Stefan Brugger, Ali Benlalah, Sebastian Martin, Brian Amberg, Michael Tschannen
Recent advances in deep generative models led to the development of neural face video compression codecs that use an order of magnitude less bandwidth than engineered codecs.
no code implementations • 1 Jan 2021 • Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic
Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes.
no code implementations • 6 Oct 2020 • Rob Romijnders, Aravindh Mahendran, Michael Tschannen, Josip Djolonga, Marvin Ritter, Neil Houlsby, Mario Lucic
We propose a method to learn image representations from uncurated videos.
1 code implementation • CVPR 2021 • Josip Djolonga, Jessica Yung, Michael Tschannen, Rob Romijnders, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Matthias Minderer, Alexander D'Amour, Dan Moldovan, Sylvain Gelly, Neil Houlsby, Xiaohua Zhai, Mario Lucic
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
4 code implementations • NeurIPS 2020 • Fabian Mentzer, George Toderici, Michael Tschannen, Eirikur Agustsson
We extensively study how to combine Generative Adversarial Networks and learned compression to obtain a state-of-the-art generative lossy compression system.
no code implementations • ICLR Workshop LLD 2019 • Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem
Recently, Locatello et al. (2019) demonstrated that unsupervised disentanglement learning without inductive biases is theoretically impossible and that existing inductive biases and unsupervised methods do not allow to consistently learn disentangled representations.
1 code implementation • CVPR 2020 • Fabian Mentzer, Luc van Gool, Michael Tschannen
We leverage the powerful lossy image compression algorithm BPG to build a lossless image compression system.
no code implementations • ICML 2020 • Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen
In self-supervised visual representation learning, a feature extractor is trained on a "pretext task" for which labels can be generated cheaply, without human annotation.
3 code implementations • ICML 2020 • Francesco Locatello, Ben Poole, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem, Michael Tschannen
Third, we perform a large-scale empirical study and show that such pairs of observations are sufficient to reliably learn disentangled representations on several benchmark data sets.
no code implementations • CVPR 2020 • Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Xiaohua Zhai, Neil Houlsby, Sylvain Gelly, Mario Lucic
We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI).
Ranked #12 on
Image Classification
on VTAB-1k
(using extra training data)
2 code implementations • 26 Nov 2019 • Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic
For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts.
Ranked #1 on
Image Generation
on Cityscapes-5K 256x512
2 code implementations • arXiv 2020 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby
And, how close are we to general visual representations?
Ranked #7 on
Image Classification
on VTAB-1k
(using extra training data)
no code implementations • 25 Sep 2019 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby
Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets.
2 code implementations • ICLR 2020 • Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic
Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data.
no code implementations • 3 May 2019 • Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem
Recently, Locatello et al. (2019) demonstrated that unsupervised disentanglement learning without inductive biases is theoretically impossible and that existing inductive biases and unsupervised methods do not allow to consistently learn disentangled representations.
1 code implementation • 6 Mar 2019 • Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly
Deep generative models are becoming a cornerstone of modern machine learning.
Ranked #11 on
Conditional Image Generation
on ImageNet 128x128
Conditional Image Generation
Vocal Bursts Intensity Prediction
no code implementations • 12 Dec 2018 • Michael Tschannen, Olivier Bachem, Mario Lucic
Finally, we provide an analysis of autoencoder-based representation learning through the lens of rate-distortion theory and identify a clear tradeoff between the amount of prior knowledge available about the downstream tasks, and how useful the representation is for this task.
3 code implementations • CVPR 2019 • Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc van Gool
We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000.
Ranked #3 on
Image Compression
on ImageNet32
1 code implementation • NeurIPS 2018 • Michael Tschannen, Eirikur Agustsson, Mario Lucic
We propose and study the problem of distribution-preserving lossy compression.
2 code implementations • ICML 2018 • Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar
Knowledge distillation (KD) consists of transferring knowledge from one machine learning model (the teacher}) to another (the student).
1 code implementation • ICCV 2019 • Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, Luc van Gool
We present a learned image compression system based on GANs, operating at extremely low bitrates.
1 code implementation • ICLR 2018 • Robert Torfason, Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc van Gool
Motivated by recent work on deep neural network (DNN)-based image compression methods showing potential improvements in image quality, savings in storage, and bandwidth reduction, we propose to perform image understanding tasks such as classification and segmentation directly on the compressed representations produced by these compression methods.
1 code implementation • CVPR 2018 • Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc van Gool
During training, the auto-encoder makes use of the context model to estimate the entropy of its representation, and the context model is concurrently updated to learn the dependencies between the symbols in the latent representation.
1 code implementation • ICML 2018 • Michael Tschannen, Aran Khanna, Anima Anandkumar
A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers.
1 code implementation • 17 Oct 2017 • Martin Zihlmann, Dmytro Perekrestenko, Michael Tschannen
We propose two deep neural network architectures for classification of arbitrary-length electrocardiogram (ECG) recordings and evaluate them on the atrial fibrillation (AF) classification data set provided by the PhysioNet/CinC Challenge 2017.
no code implementations • NeurIPS 2017 • Francesco Locatello, Michael Tschannen, Gunnar Rätsch, Martin Jaggi
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees.
no code implementations • NeurIPS 2017 • Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc van Gool
We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy.
no code implementations • 21 Feb 2017 • Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi
Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms.
no code implementations • 11 Dec 2016 • Michael Tschannen, Helmut Bölcskei
The clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for SSC and for the thresholding-based subspace clustering (TSC) algorithm due to Heckel and B\"olcskei.
no code implementations • 4 Dec 2016 • Michael Tschannen, Helmut Bölcskei
We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their generative models without prior knowledge of the model statistics and the number of generative models.
no code implementations • 26 Sep 2016 • Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, Luca Benini
We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms.
no code implementations • 26 May 2016 • Thomas Wiatowski, Michael Tschannen, Aleksandar Stanić, Philipp Grohs, Helmut Bölcskei
First steps towards a mathematical theory of deep convolutional neural networks for feature extraction were made---for the continuous-time case---in Mallat, 2012, and Wiatowski and B\"olcskei, 2015.
no code implementations • 12 Feb 2016 • Rajiv Khanna, Michael Tschannen, Martin Jaggi
Efficiently representing real world data in a succinct and parsimonious manner is of central importance in many fields.
no code implementations • 25 Jul 2015 • Reinhard Heckel, Michael Tschannen, Helmut Bölcskei
Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown.
no code implementations • 20 Apr 2015 • Michael Tschannen, Helmut Bölcskei
We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their nonparametric generative models without prior knowledge of the model statistics and the number of generative models.
no code implementations • 27 Apr 2014 • Reinhard Heckel, Michael Tschannen, Helmut Bölcskei
Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, assumed unknown.