1 code implementation • 6 Jun 2024 • Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata
Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-$\alpha$, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time.
3 code implementations • 12 May 2022 • Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby
Combining simple architectures with large-scale pre-training has led to massive improvements in image classification.
Ranked #1 on One-Shot Object Detection on MS COCO
1 code implementation • CVPR 2022 • Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi
In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass.
3 code implementations • ICLR 2022 • Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff
Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built.
4 code implementations • NeurIPS 2021 • Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy
Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.
49 code implementations • NeurIPS 2021 • Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy
Convolutional Neural Networks (CNNs) are the go-to model for computer vision.
Ranked #17 on Image Classification on OmniBenchmark
no code implementations • CVPR 2021 • Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner
Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand.
no code implementations • 20 Nov 2020 • Sindy Löwe, Klaus Greff, Rico Jonschkowski, Alexey Dosovitskiy, Thomas Kipf
We address this problem by introducing a global, set-based contrastive loss: instead of contrasting individual slot representations against one another, we aggregate the representations and contrast the joined sets against one another.
146 code implementations • ICLR 2021 • Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (Carton Gap metric)
1 code implementation • CVPR 2021 • Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs.
8 code implementations • NeurIPS 2020 • Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf
Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features.
no code implementations • ICLR 2020 • Alexey Dosovitskiy, Josip Djolonga
At test time a model trained this way can be conditioned to generate outputs corresponding to any loss from the training distribution of losses.
no code implementations • 2 Mar 2020 • Antonio Loquercio, Alexey Dosovitskiy, Davide Scaramuzza
Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
2 code implementations • arXiv 2020 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby
And, how close are we to general visual representations?
Ranked #10 on Image Classification on VTAB-1k (using extra training data)
no code implementations • 25 Sep 2019 • Antonio Loquercio, Alexey Dosovitskiy, Davide Scaramuzza
Natural intelligent agents learn to perceive the three dimensional structure of the world without training on large datasets and are unlikely to have the precise equations of projective geometry hard-wired in the brain.
no code implementations • 25 Sep 2019 • Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby
Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets.
1 code implementation • 30 Jan 2019 • Dmytro Mishkin, Alexey Dosovitskiy, Vladlen Koltun
However, this new line of work is largely disconnected from well-established classic navigation approaches.
2 code implementations • 24 Jan 2019 • Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, Marco Hutter
In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes.
no code implementations • 10 Jan 2019 • Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox
In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken.
3 code implementations • NeurIPS 2018 • Eldar Insafutdinov, Alexey Dosovitskiy
We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images.
1 code implementation • ECCV 2018 • Felipe Codevilla, Antonio M. López, Vladlen Koltun, Alexey Dosovitskiy
We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics.
10 code implementations • 18 Jul 2018 • Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir
Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.
1 code implementation • ICLR 2018 • Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox
Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators.
no code implementations • 25 Apr 2018 • Matthias Müller, Alexey Dosovitskiy, Bernard Ghanem, Vladlen Koltun
Simulation can help end-to-end driving systems by providing a cheap, safe, and diverse training environment.
1 code implementation • ICLR 2018 • Nikolay Savinov, Alexey Dosovitskiy, Vladlen Koltun
We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals.
1 code implementation • 19 Jan 2018 • Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox
The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations.
2 code implementations • 11 Dec 2017 • Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun
We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments.
1 code implementation • 10 Nov 2017 • Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, Vladlen Koltun
We introduce CARLA, an open-source simulator for autonomous driving research.
7 code implementations • 6 Oct 2017 • Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, Alexey Dosovitskiy
However, driving policies trained via imitation learning cannot be controlled at test time.
4 code implementations • 13 Aug 2017 • Manuel Ruder, Alexey Dosovitskiy, Thomas Brox
We propose a deep network architecture and training procedures that allow us to stylize arbitrary-length videos in a consistent and stable way, and nearly in real time.
1 code implementation • ICCV 2017 • Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox
We present a deep convolutional decoder architecture that can generate volumetric 3D outputs in a compute- and memory-efficient manner by using an octree representation.
Ranked #3 on 3D Reconstruction on Data3D−R2N2
2 code implementations • CVPR 2017 • Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, Thomas Brox
In this paper we formulate structure from motion as a learning problem.
12 code implementations • CVPR 2017 • Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox
Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods.
Dense Pixel Correspondence Estimation Optical Flow Estimation +1
no code implementations • NeurIPS 2016 • Vladimir Golkov, Marcin J. Skwark, Antonij Golkov, Alexey Dosovitskiy, Thomas Brox, Jens Meiler, Daniel Cremers
A contact map is a compact representation of the three-dimensional structure of a protein via the pairwise contacts between the amino acid constituting the protein.
1 code implementation • CVPR 2017 • Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.
2 code implementations • 6 Nov 2016 • Alexey Dosovitskiy, Vladlen Koltun
A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.
5 code implementations • NeurIPS 2016 • Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, Jeff Clune
Understanding the inner workings of such computational brains is both fascinating basic science that is interesting in its own right - similar to why we study the human brain - and will enable researchers to further improve DNNs.
3 code implementations • 28 Apr 2016 • Manuel Ruder, Alexey Dosovitskiy, Thomas Brox
We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence.
10 code implementations • NeurIPS 2016 • Alexey Dosovitskiy, Thomas Brox
This metric better reflects perceptually similarity of images and thus leads to better results.
3 code implementations • CVPR 2016 • Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox
By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.
no code implementations • 20 Nov 2015 • Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox
We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object.
2 code implementations • CVPR 2016 • Alexey Dosovitskiy, Thomas Brox
Inverting a deep network trained on ImageNet provides several insights into the properties of the feature representation learned by the network.
1 code implementation • CVPR 2015 • Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox
We train a generative convolutional neural network which is able to generate images of objects given object type, viewpoint, and color.
18 code implementations • ICCV 2015 • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox
Optical flow estimation has not been among the tasks where CNNs were successful.
37 code implementations • 21 Dec 2014 • Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller
Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers.
Ranked #125 on Image Classification on CIFAR-10
1 code implementation • NeurIPS 2014 • Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox
Current methods for training convolutional neural networks depend on large amounts of labeled samples for supervised training.
Ranked #84 on Image Classification on STL-10
2 code implementations • 21 Nov 2014 • Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox
We train generative 'up-convolutional' neural networks which are able to generate images of objects given object style, viewpoint, and color.
2 code implementations • 26 Jun 2014 • Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox
While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
no code implementations • 22 May 2014 • Philipp Fischer, Alexey Dosovitskiy, Thomas Brox
Surprisingly, convolutional neural networks clearly outperform SIFT on descriptor matching.
no code implementations • 18 Dec 2013 • Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox
We then extend these trivial one-element classes by applying a variety of transformations to the initial 'seed' patches.
no code implementations • CVPR 2013 • Peter Ochs, Alexey Dosovitskiy, Thomas Brox, Thomas Pock
Here we extend the problem class to linearly constrained optimization of a Lipschitz continuous function, which is the sum of a convex function and a function being concave and increasing on the non-negative orthant (possibly non-convex and nonconcave on the whole space).