Search Results for author: Alexey Dosovitskiy

Found 50 papers, 34 papers with code

Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

no code implementations25 Nov 2021 Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi

In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass.

Novel View Synthesis Semantic Segmentation

Conditional Object-Centric Learning from Video

no code implementations ICLR 2022 Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built.

Instance Segmentation Optical Flow Estimation +2

Do Vision Transformers See Like Convolutional Neural Networks?

3 code implementations NeurIPS 2021 Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.

Classification Image Classification +1

Differentiable Patch Selection for Image Recognition

no code implementations CVPR 2021 Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner

Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand.

Traffic Sign Recognition

Learning Object-Centric Video Models by Contrasting Sets

no code implementations20 Nov 2020 Sindy Löwe, Klaus Greff, Rico Jonschkowski, Alexey Dosovitskiy, Thomas Kipf

We address this problem by introducing a global, set-based contrastive loss: instead of contrasting individual slot representations against one another, we aggregate the representations and contrast the joined sets against one another.

Future prediction Self-Supervised Learning

NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

1 code implementation CVPR 2021 Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth

We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs.

Object-Centric Learning with Slot Attention

3 code implementations NeurIPS 2020 Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features.

Object Discovery

You Only Train Once: Loss-Conditional Training of Deep Networks

no code implementations ICLR 2020 Alexey Dosovitskiy, Josip Djolonga

At test time a model trained this way can be conditioned to generate outputs corresponding to any loss from the training distribution of losses.

Image Compression Style Transfer

Learning Depth With Very Sparse Supervision

no code implementations2 Mar 2020 Antonio Loquercio, Alexey Dosovitskiy, Davide Scaramuzza

Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.

Depth Estimation

Global-Local Network for Learning Depth with Very Sparse Supervision

no code implementations25 Sep 2019 Antonio Loquercio, Alexey Dosovitskiy, Davide Scaramuzza

Natural intelligent agents learn to perceive the three dimensional structure of the world without training on large datasets and are unlikely to have the precise equations of projective geometry hard-wired in the brain.

Depth Estimation

Benchmarking Classic and Learned Navigation in Complex 3D Environments

1 code implementation30 Jan 2019 Dmytro Mishkin, Alexey Dosovitskiy, Vladlen Koltun

However, this new line of work is largely disconnected from well-established classic navigation approaches.

Learning agile and dynamic motor skills for legged robots

2 code implementations24 Jan 2019 Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, Marco Hutter

In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes.

Legged Robots reinforcement-learning

Motion Perception in Reinforcement Learning with Dynamic Objects

no code implementations10 Jan 2019 Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken.

Continuous Control Frame +1

Unsupervised Learning of Shape and Pose with Differentiable Point Clouds

3 code implementations NeurIPS 2018 Eldar Insafutdinov, Alexey Dosovitskiy

We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images.

3D Pose Estimation

On Offline Evaluation of Vision-based Driving Models

1 code implementation ECCV 2018 Felipe Codevilla, Antonio M. López, Vladlen Koltun, Alexey Dosovitskiy

We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics.

Autonomous Driving

On Evaluation of Embodied Navigation Agents

9 code implementations18 Jul 2018 Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

1 code implementation ICLR 2018 Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox

Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators.

reinforcement-learning

Driving Policy Transfer via Modularity and Abstraction

no code implementations25 Apr 2018 Matthias Müller, Alexey Dosovitskiy, Bernard Ghanem, Vladlen Koltun

Simulation can help end-to-end driving systems by providing a cheap, safe, and diverse training environment.

Autonomous Driving

Semi-parametric Topological Memory for Navigation

1 code implementation ICLR 2018 Nikolay Savinov, Alexey Dosovitskiy, Vladlen Koltun

We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals.

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

1 code implementation19 Jan 2018 Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations.

Optical Flow Estimation

MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments

2 code implementations11 Dec 2017 Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun

We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments.

reinforcement-learning

Artistic style transfer for videos and spherical images

4 code implementations13 Aug 2017 Manuel Ruder, Alexey Dosovitskiy, Thomas Brox

We propose a deep network architecture and training procedures that allow us to stylize arbitrary-length videos in a consistent and stable way, and nearly in real time.

Style Transfer

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs

1 code implementation ICCV 2017 Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox

We present a deep convolutional decoder architecture that can generate volumetric 3D outputs in a compute- and memory-efficient manner by using an octree representation.

3D Reconstruction

Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images

no code implementations NeurIPS 2016 Vladimir Golkov, Marcin J. Skwark, Antonij Golkov, Alexey Dosovitskiy, Thomas Brox, Jens Meiler, Daniel Cremers

A contact map is a compact representation of the three-dimensional structure of a protein via the pairwise contacts between the amino acid constituting the protein.

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

1 code implementation CVPR 2017 Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw.

Image Captioning Image Inpainting

Learning to Act by Predicting the Future

2 code implementations6 Nov 2016 Alexey Dosovitskiy, Vladlen Koltun

A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

5 code implementations NeurIPS 2016 Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, Jeff Clune

Understanding the inner workings of such computational brains is both fascinating basic science that is interesting in its own right - similar to why we study the human brain - and will enable researchers to further improve DNNs.

Artistic style transfer for videos

4 code implementations28 Apr 2016 Manuel Ruder, Alexey Dosovitskiy, Thomas Brox

We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence.

Style Transfer

Multi-view 3D Models from Single Images with a Convolutional Network

no code implementations20 Nov 2015 Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox

We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object.

Inverting Visual Representations with Convolutional Networks

2 code implementations CVPR 2016 Alexey Dosovitskiy, Thomas Brox

Inverting a deep network trained on ImageNet provides several insights into the properties of the feature representation learned by the network.

Learning to Generate Chairs With Convolutional Neural Networks

1 code implementation CVPR 2015 Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

We train a generative convolutional neural network which is able to generate images of objects given object type, viewpoint, and color.

Striving for Simplicity: The All Convolutional Net

34 code implementations21 Dec 2014 Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller

Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers.

Image Classification Object Recognition

Learning to Generate Chairs, Tables and Cars with Convolutional Networks

1 code implementation21 Nov 2014 Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox

We train generative 'up-convolutional' neural networks which are able to generate images of objects given object style, viewpoint, and color.

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

1 code implementation26 Jun 2014 Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox

While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

General Classification Geometric Matching

Unsupervised feature learning by augmenting single images

no code implementations18 Dec 2013 Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

We then extend these trivial one-element classes by applying a variety of transformations to the initial 'seed' patches.

Data Augmentation Object Recognition

An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision

no code implementations CVPR 2013 Peter Ochs, Alexey Dosovitskiy, Thomas Brox, Thomas Pock

Here we extend the problem class to linearly constrained optimization of a Lipschitz continuous function, which is the sum of a convex function and a function being concave and increasing on the non-negative orthant (possibly non-convex and nonconcave on the whole space).

Image Denoising Optical Flow Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.