no code implementations • 1 Oct 2024 • Junlin Han, Jianyuan Wang, Andrea Vedaldi, Philip Torr, Filippos Kokkinos
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
no code implementations • 22 Aug 2024 • Wenjing Bian, ZiRui Wang, Andrea Vedaldi
Image-based 3D object detection is widely employed in applications such as autonomous vehicles and robotics, yet current systems struggle with generalisation due to complex problem setup and limited training data.
no code implementations • 19 Aug 2024 • Yash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman
Egocentric videos present unique challenges for 3D scene understanding due to rapid camera motion, frequent object occlusions, and limited object visibility.
no code implementations • 8 Aug 2024 • Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
no code implementations • 26 Jul 2024 • Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
The reduction works by matching images of the object to non-photorealistic renders of the template, which emulates the process of collecting manual annotations for this task.
no code implementations • 2 Jul 2024 • Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi
We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation.
no code implementations • 2 Jul 2024 • Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni
The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects.
no code implementations • 2 Jul 2024 • Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny
We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control.
1 code implementation • 6 Jun 2024 • Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi
In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient.
no code implementations • 30 Apr 2024 • Paul Engstler, Andrea Vedaldi, Iro Laina, Christian Rupprecht
These works often depend on pre-trained monocular depth estimators to lift the generated images into 3D, fusing them with the existing scene representation.
1 code implementation • 30 Apr 2024 • Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny
Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision.
no code implementations • 29 Apr 2024 • Minghao Chen, Iro Laina, Andrea Vedaldi
A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data.
no code implementations • 22 Mar 2024 • Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi
We introduce DragAPart, a method that, given an image and a set of drags as input, generates a new image of the same object that responds to the action of the drags.
no code implementations • 16 Mar 2024 • Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi
To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities.
1 code implementation • 15 Feb 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.
no code implementations • 13 Feb 2024 • Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly.
no code implementations • CVPR 2024 • Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
no code implementations • CVPR 2024 • Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
With the aid of a frequency-modulated loss GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.
1 code implementation • CVPR 2024 • Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
no code implementations • 14 Dec 2023 • Animesh Karnewar, Roman Shapovalov, Tom Monnier, Andrea Vedaldi, Niloy J. Mitra, David Novotny
Encoding information from 2D views of an object into a 3D representation is crucial for generalized 3D feature extraction.
no code implementations • CVPR 2024 • Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi
In particular, we hypothesise that editing can be greatly simplified by first encoding 3D objects in a suitable latent space.
1 code implementation • CVPR 2024 • Chuanxia Zheng, Andrea Vedaldi
Similar to Zero-1-to-3, we start from a pre-trained 2D image generator for generalization, and fine-tune it for NVS.
1 code implementation • 4 Dec 2023 • Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, Ronald Clark
To address this, we introduce the concept of a meta-calibrator that performs uncertainty calibration for NeRFs with a single forward pass without the need for holding out any images from the target scene.
no code implementations • ICCV 2023 • Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, David Novotny
Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism.
1 code implementation • ICCV 2023 • Chuanxia Zheng, Andrea Vedaldi
Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is increasingly used in representation learning.
1 code implementation • ICCV 2023 • Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova
We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially.
1 code implementation • 14 Jul 2023 • Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
We introduce CoTracker, a transformer-based model that tracks a large number of 2D points in long video sequences.
Ranked #2 on Point Tracking on TAP-Vid-Kinetics-First
no code implementations • 15 Jun 2023 • Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht
Open-vocabulary segmentation is the task of segmenting anything that can be named in an image.
1 code implementation • NeurIPS 2023 • Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi
Compared to other neural rendering datasets, EPIC Fields is better tailored to video understanding because it is paired with labelled action segments and the recent VISOR segment annotations.
1 code implementation • ICCV 2023 • Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
We fit a diffusion model to a large number of viewsets for a given category of objects.
1 code implementation • NeurIPS 2023 • Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.
1 code implementation • CVPR 2023 • Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
no code implementations • 20 Apr 2023 • Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch.
no code implementations • ICCV 2023 • Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation.
no code implementations • CVPR 2023 • Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, Christian Rupprecht
Incremental object detection (IOD) aims to train an object detector in phases, each with annotations for new object categories.
Class-Incremental Object Detection Knowledge Distillation +3
1 code implementation • 6 Apr 2023 • Minghao Chen, Iro Laina, Andrea Vedaldi
We thoroughly evaluate our approach on three benchmarks and provide several qualitative examples and a comparative analysis of the two strategies that demonstrate the superiority of backward guidance compared to forward guidance, as well as prior work.
no code implementations • CVPR 2023 • Animesh Karnewar, Andrea Vedaldi, David Novotny, Niloy Mitra
We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
1 code implementation • 21 Mar 2023 • Ignacio Rocco, Iurii Makarov, Filippos Kokkinos, David Novotny, Benjamin Graham, Natalia Neverova, Andrea Vedaldi
We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos with accompanying parametric body fits.
2 code implementations • 21 Feb 2023 • Luke Melas-Kyriazi, Christian Rupprecht, Andrea Vedaldi
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
3 code implementations • 21 Feb 2023 • Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi
We consider the problem of reconstructing a full 360{\deg} photographic model of an object from a single image of it.
no code implementations • 26 Jan 2023 • Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.
no code implementations • CVPR 2023 • Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi
We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
no code implementations • CVPR 2023 • Luke Melas-Kyriazi, Christian Rupprecht, Andrea Vedaldi
Reconstructing the 3D shape of an object from a single RGB image is a long-standing problem in computer vision.
no code implementations • CVPR 2023 • Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Andrea Vedaldi
We consider the problem of reconstructing a full 360deg photographic model of an object from a single image of it.
1 code implementation • 6 Dec 2022 • Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham
To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.
no code implementations • CVPR 2023 • Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi
We consider the problem of predicting the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse given a single test image as input.
no code implementations • CVPR 2023 • Samarth Sinha, Roman Shapovalov, Jeremy Reizenstein, Ignacio Rocco, Natalia Neverova, Andrea Vedaldi, David Novotny
Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors.
no code implementations • 21 Oct 2022 • Laurynas Karazija, Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi
We propose a new approach to learn to segment multiple image objects without manual supervision.
no code implementations • 7 Sep 2022 • Iro Laina, Yuki M. Asano, Andrea Vedaldi
Self-supervised visual representation learning has recently attracted significant research interest.
no code implementations • 7 Sep 2022 • Vadim Tschernezki, Iro Laina, Diane Larlus, Andrea Vedaldi
We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene.
no code implementations • 13 Jun 2022 • Eldar Insafutdinov, Dylan Campbell, João F. Henriques, Andrea Vedaldi
We present a method for the accurate 3D reconstruction of partly-symmetric objects.
no code implementations • 16 May 2022 • Subhabrata Choudhury, Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht
Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos.
Ranked #4 on Unsupervised Object Segmentation on SegTrack-v2
1 code implementation • CVPR 2022 • Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi
We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene.
no code implementations • 3 May 2022 • Andrew Brown, Cheng-Yang Fu, Omkar Parkhi, Tamara L. Berg, Andrea Vedaldi
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.
1 code implementation • CVPR 2022 • Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman
Here, the unlabelled images may come from labelled classes or from novel ones.
Ranked #1 on Open-World Semi-Supervised Learning on CIFAR-10 (Seen accuracy (50% Labeled) metric)
Fine-Grained Visual Recognition Open-World Semi-Supervised Learning +1
no code implementations • CVPR 2022 • David Novotny, Ignacio Rocco, Samarth Sinha, Alexandre Carlier, Gael Kerchenbaum, Roman Shapovalov, Nikita Smetanin, Natalia Neverova, Benjamin Graham, Andrea Vedaldi
Compared to weaker deformation models, this significantly reduces the reconstruction ambiguity and, for dynamic objects, allows Keypoint Transporter to obtain reconstructions of the quality superior or at least comparable to prior approaches while being much faster and reliant on a pre-trained monocular depth estimator network.
1 code implementation • CVPR 2022 • Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo
Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model.
no code implementations • 8 Dec 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.
1 code implementation • NeurIPS 2021 • Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi
First, we construct a proxy task through a set of objectives that encourages the model to learn a meaningful decomposition of the image into its parts.
Ranked #1 on Unsupervised Keypoint Estimation on CUB
1 code implementation • 5 Nov 2021 • Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi
We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis.
no code implementations • 19 Oct 2021 • Vadim Tschernezki, Diane Larlus, Andrea Vedaldi
Given a raw video sequence taken from a freely-moving camera, we study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground containing the objects that move in the video sequence.
2 code implementations • ICLR 2022 • Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman
In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
Ranked #10 on Out-of-Distribution Detection on CIFAR-100 vs CIFAR-10
no code implementations • ICLR 2022 • Iro Laina, Yuki M Asano, Andrea Vedaldi
Self-supervised visual representation learning has attracted significant research interest.
Ranked #90 on Image Classification on ObjectNet (using extra training data)
1 code implementation • NeurIPS Workshop ImageNet_PPF 2021 • Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi
On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be necessary, or perhaps not even optimal, for model pretraining.
1 code implementation • 16 Sep 2021 • Robert McCraith, Eldar Insafutdinov, Lukas Neumann, Andrea Vedaldi
We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects.
no code implementations • 16 Sep 2021 • Robert McCraith, Lukas Neumann, Andrea Vedaldi
Vision is one of the primary sensing modalities in autonomous driving.
no code implementations • ICCV 2021 • Roman Shapovalov, David Novotny, Benjamin Graham, Patrick Labatut, Andrea Vedaldi
The method learns, in an end-to-end fashion, a soft partition of a given category-specific 3D template mesh into rigid parts together with a monocular reconstruction network that predicts the part motions such that they reproject correctly onto 2D DensePose-like surface annotations of the object.
no code implementations • 19 Aug 2021 • Matan Atzmon, David Novotny, Andrea Vedaldi, Yaron Lipman
Implicit neural representation is a recent approach to learn shape collections as zero level-sets of neural networks, where each shape is represented by a latent code.
no code implementations • 22 Jul 2021 • Shangzhe Wu, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi
In this paper, we present DOVE, a method that learns textured 3D models of deformable object categories from monocular videos available online, without keypoint, viewpoint or template shape supervision.
1 code implementation • 29 Jun 2021 • Kai Han, Sylvestre-Alvise Rebuffi, Sébastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman
We present a new approach called AutoNovel to address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labelled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use ranking statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data.
Ranked #1 on Novel Class Discovery on SVHN
no code implementations • CVPR 2021 • Lukas Neumann, Andrea Vedaldi
Predicting future pedestrian trajectory is a crucial component of autonomous driving systems, as recognizing critical situations based only on current pedestrian position may come too late for any meaningful corrective action (e. g. breaking) to take place.
no code implementations • CVPR 2021 • Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i. e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them.
no code implementations • CVPR 2021 • Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, Andrea Vedaldi
Recent work has shown that it is possible to learn a unified dense pose predictor for several categories of related objects.
1 code implementation • 15 Jun 2021 • Xu Ji, Razvan Pascanu, Devon Hjelm, Balaji Lakshminarayanan, Andrea Vedaldi
Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space.
2 code implementations • NeurIPS 2021 • Mandela Patrick, Dylan Campbell, Yuki M. Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques
In video transformers, the time dimension is often treated in the same way as the two spatial dimensions.
Ranked #15 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)
1 code implementation • ICLR 2022 • Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi
Recent research has shown that numerous human-interpretable directions exist in the latent space of GANs.
no code implementations • 5 May 2021 • Dan Xu, Andrea Vedaldi, Joao F. Henriques
We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map.
no code implementations • CVPR 2022 • Triantafyllos Afouras, Yuki M. Asano, Francois Fagan, Andrea Vedaldi, Florian Metze
We tackle the problem of learning object detectors without supervision.
1 code implementation • CVPR 2021 • Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
We show that our algorithm achieves state-of-the-art performance on the popular Flickr SoundNet dataset.
no code implementations • CVPR 2021 • Philipp Henzler, Jeremy Reizenstein, Patrick Labatut, Roman Shapovalov, Tobias Ritschel, Andrea Vedaldi, David Novotny
Our goal is to learn a deep network that, given a small number of images of an object of a given category, reconstructs it in 3D.
1 code implementation • ICCV 2021 • Mandela Patrick, Yuki M. Asano, Bernie Huang, Ishan Misra, Florian Metze, Joao Henriques, Andrea Vedaldi
First, for space, we show that spatial augmentations such as cropping do work well for videos too, but that previous implementations, due to the high processing and memory cost, could not do this at a scale sufficient for it to work well.
1 code implementation • NeurIPS 2020 • Natalia Neverova, David Novotny, Vasil Khalidov, Marc Szafraniec, Patrick Labatut, Andrea Vedaldi
In this work, we focus on the task of learning and representing dense correspondences in deformable object categories.
no code implementations • NeurIPS 2020 • Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
Ranked #3 on Multi-Hypotheses 3D Human Pose Estimation on AH36M
no code implementations • NeurIPS 2020 • Iro Laina, Ruth C. Fong, Andrea Vedaldi
The increasing impact of black box models, and particularly of unsupervised ones, comes with an increasing interest in tools to understand and interpret them.
no code implementations • ICLR 2021 • Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi
The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.
no code implementations • 28 Sep 2020 • Mandela Patrick, Yuki Asano, Polina Kuznetsova, Ruth Fong, Joao F. Henriques, Geoffrey Zweig, Andrea Vedaldi
In this paper, we show that, for videos, the answer is more complex, and that better results can be obtained by accounting for the interplay between invariance, distinctiveness, multiple modalities and time.
no code implementations • 16 Sep 2020 • Robert McCraith, Lukas Neumann, Andrea Vedaldi
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
1 code implementation • NeurIPS 2020 • David Novotny, Roman Shapovalov, Andrea Vedaldi
We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
1 code implementation • NeurIPS 2020 • Sebastien Ehrhardt, Oliver Groth, Aron Monszpart, Martin Engelcke, Ingmar Posner, Niloy Mitra, Andrea Vedaldi
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
1 code implementation • NeurIPS 2020 • Yuki M. Asano, Mandela Patrick, Christian Rupprecht, Andrea Vedaldi
A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data.
1 code implementation • 22 Jun 2020 • Xu Ji, Joao Henriques, Tinne Tuytelaars, Andrea Vedaldi
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
1 code implementation • 17 Jun 2020 • Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Kai Han, Andrea Vedaldi, Andrew Zisserman
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
3 code implementations • 29 Apr 2020 • Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman
Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.
no code implementations • 13 Apr 2020 • Robert McCraith, Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
Recent advances in self-supervised learning havedemonstrated that it is possible to learn accurate monoculardepth reconstruction from raw video data, without using any 3Dground truth for supervision.
1 code implementation • 7 Apr 2020 • Hanbyul Joo, Natalia Neverova, Andrea Vedaldi
Remarkably, the resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks such as 3DPW.
Ranked #23 on 3D Human Pose Estimation on MPI-INF-3DHP (PA-MPJPE metric)
1 code implementation • CVPR 2020 • Sylvestre-Alvise Rebuffi, Ruth Fong, Xu Ji, Andrea Vedaldi
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.
1 code implementation • 19 Mar 2020 • Oliver Groth, Chia-Man Hung, Andrea Vedaldi, Ingmar Posner
Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick-and-place from raw images.
1 code implementation • 18 Mar 2020 • Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou
An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88. 5% top-1 accuracy (top-5: 98. 7%), which establishes the new state of the art for ImageNet with a single crop.
Ranked #9 on Image Classification on ImageNet ReaL (using extra training data)
1 code implementation • ICCV 2021 • Mandela Patrick, Yuki M. Asano, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi
In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning.
1 code implementation • CVPR 2020 • Artsiom Sanakoyeu, Vasil Khalidov, Maureen S. McCarthy, Andrea Vedaldi, Natalia Neverova
Recent contributions have demonstrated that it is possible to recognize the pose of humans densely and accurately given a large dataset of poses annotated in detail.
1 code implementation • ICLR 2020 • Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman
In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data.
no code implementations • NeurIPS 2019 • Natalia Neverova, David Novotny, Andrea Vedaldi
We show that these models, by understanding uncertainty better, can solve the original DensePose task more accurately, thus setting the new state-of-the-art accuracy in this benchmark.
1 code implementation • CVPR 2020 • Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision.
5 code implementations • ICLR 2020 • Yuki Markus Asano, Christian Rupprecht, Andrea Vedaldi
Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks.
Ranked #8 on Contrastive Learning on imagenet-1k
no code implementations • 23 Oct 2019 • Ruth Fong, Andrea Vedaldi
Deep networks for visual recognition are known to leverage "easy to recognise" portions of objects such as faces and distinctive texture patterns.
no code implementations • 19 Oct 2019 • Sylvestre-Alvise Rebuffi, Ruth Fong, Xu Ji, Hakan Bilen, Andrea Vedaldi
In this paper, we are rather interested by the locations of an image that contribute to the model's training.
2 code implementations • ICCV 2019 • Ruth Fong, Mandela Patrick, Andrea Vedaldi
In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable.
2 code implementations • ICCV 2019 • David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi
We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images.
1 code implementation • ICCV 2019 • Kai Han, Andrea Vedaldi, Andrew Zisserman
The second contribution is a method to estimate the number of classes in the unlabelled data.
1 code implementation • ICCV 2019 • James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi
Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision.
Ranked #1 on Unsupervised Facial Landmark Detection on 300W
no code implementations • 14 Aug 2019 • Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman
We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.
no code implementations • CVPR 2020 • Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses.
3 code implementations • NeurIPS 2019 • Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou
Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86. 4% (top-5: 98. 0%) (single-crop).
Ranked #2 on Fine-Grained Image Classification on Birdsnap (using extra training data)
no code implementations • CVPR 2019 • Natalia Neverova, James Thewlis, Riza Alp Güler, Iasonas Kokkinos, Andrea Vedaldi
DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates.
no code implementations • 4 Jun 2019 • Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
Specifically, given a single image of the object seen from an arbitrary viewpoint, our model predicts a symmetric canonical view, the corresponding 3D shape and a viewpoint transformation, and trains with the goal of reconstructing the input view, resembling an auto-encoder.
no code implementations • 26 May 2019 • Sébastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
We are interested in learning models of intuitive physics similar to the ones that animals use for navigation, manipulation and planning.
1 code implementation • 21 May 2019 • Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Kai Han, Andrea Vedaldi, Andrew Zisserman
The first is a simple but effective one: we leverage the power of transfer learning among different tasks and self-supervision to initialize a good representation of the data without making use of any label.
no code implementations • ICLR 2019 • Fabian Fuchs, Oliver Groth, Adam Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner
Using an adversarial stethoscope, the network is successfully de-biased, leading to a performance increase from 66% to 88%.
2 code implementations • ICLR 2020 • Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi
We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels.
3 code implementations • 14 Feb 2019 • Maxim Berman, Hervé Jégou, Andrea Vedaldi, Iasonas Kokkinos, Matthijs Douze
When fed to a linear classifier, the learned embeddings provide state-of-the-art classification accuracy.
Ranked #1 on Image Retrieval on INRIA Holidays
no code implementations • NeurIPS 2018 • James Thewlis, Hakan Bilen, Andrea Vedaldi
We propose a new approach to model and learn, without manual supervision, the symmetries of natural objects, such as faces or flowers, given only images as input.
9 code implementations • NeurIPS 2018 • Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi
We also propose a parametric gather-excite operator pair which yields further performance gains, relate it to the recently-introduced Squeeze-and-Excitation Networks, and analyse the effects of these changes to the CNN feature activation statistics.
no code implementations • 23 Sep 2018 • Ankush Gupta, Andrea Vedaldi, Andrew Zisserman
This work presents a method for visual text recognition without using any paired supervisory data.
no code implementations • ECCV 2018 • Maria Klodt, Andrea Vedaldi
First, since such self-supervised approaches are based on the brightness constancy assumption, which is valid only for a subset of pixels, we propose a probabilistic learning formulation where the network predicts distributions over variables rather than specific values.
no code implementations • 16 Aug 2018 • Samuel Albanie, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
We make the following contributions: (i) we develop a strong teacher network for facial emotion recognition that achieves the state of the art on a standard benchmark; (ii) we use the teacher to train a student, tabula rasa, to learn representations (embeddings) for speech emotion recognition without access to labelled audio data; and (iii) we show that the speech emotion embedding can be used for speech emotion recognition on external benchmark datasets.
Ranked #3 on Facial Expression Recognition (FER) on FERPlus
Facial Emotion Recognition Facial Expression Recognition (FER) +1
no code implementations • ECCV 2018 • David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi
Object detection and instance segmentation are dominated by region-based methods such as Mask RCNN.
no code implementations • 21 Jul 2018 • Ankush Gupta, Andrea Vedaldi, Andrew Zisserman
End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition.
1 code implementation • 20 Jul 2018 • Karel Lenc, Andrea Vedaldi
The new protocol is better for assessment on a large number of images and reduces the dependency of the results on unwanted distractors such as the number of detected features and the feature magnification factor.
6 code implementations • ICCV 2019 • Xu Ji, João F. Henriques, Andrea Vedaldi
The method is not specialised to computer vision and operates on any paired dataset samples; in our experiments we use random transforms to obtain a pair from each image.
Ranked #1 on Unsupervised MNIST on MNIST
no code implementations • 15 Jul 2018 • Aravindh Mahendran, James Thewlis, Andrea Vedaldi
We propose a novel method for learning convolutional neural image representations without manual supervision.
2 code implementations • NeurIPS 2018 • Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision.
no code implementations • 14 Jun 2018 • Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner
Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset.
no code implementations • CVPR 2018 • João F. Henriques, Andrea Vedaldi
The module contains an allocentric spatial memory that can be accessed associatively by feeding to it the current sensory input, resulting in localization, and then updated using an LSTM or similar mechanism.
5 code implementations • ICLR 2019 • Luca Bertinetto, João F. Henriques, Philip H. S. Torr, Andrea Vedaldi
The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.
6 code implementations • ICLR 2019 • João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi
Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration.
1 code implementation • 21 May 2018 • João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi
We propose a fast second-order method that can be used as a drop-in replacementfor current deep learning solvers.
no code implementations • 14 May 2018 • Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi
While learning models of intuitive physics is an increasingly active area of research, current approaches still fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test times.
1 code implementation • ECCV 2018 • Oliver Groth, Fabian B. Fuchs, Ingmar Posner, Andrea Vedaldi
Physical intuition is pivotal for intelligent agents to perform complex tasks.
no code implementations • CVPR 2018 • David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi
Self-supervision can dramatically cut back the amount of manually-labelled data required to train deep neural networks.
3 code implementations • CVPR 2018 • Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi
A practical limitation of deep neural networks is their high degree of specialization to a single task and visual domain.
no code implementations • ECCV 2018 • Jack Valmadre, Luca Bertinetto, João F. Henriques, Ran Tao, Andrea Vedaldi, Arnold Smeulders, Philip Torr, Efstratios Gavves
We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms.
1 code implementation • CVPR 2018 • Ruth Fong, Andrea Vedaldi
By studying such embeddings, we are able to show that 1., in most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help encode multiple concepts, and that 3., compared to single filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship to other concepts.
no code implementations • 22 Dec 2017 • Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi
In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted the relevant states, and then used neural networks to learn the state transitions using simulation runs as training data.
14 code implementations • CVPR 2018 • Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning.
Ranked #4 on Feature Upsampling on ImageNet
no code implementations • 26 Nov 2017 • Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi
Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0. 82%.
no code implementations • NeurIPS 2017 • James Thewlis, Hakan Bilen, Andrea Vedaldi
One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations.
Ranked #3 on Unsupervised Facial Landmark Detection on AFLW-MTFL
no code implementations • 6 Jun 2017 • Sébastien Ehrhardt, Aron Monszpart, Andrea Vedaldi, Niloy Mitra
While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and associated parameters.
2 code implementations • NeurIPS 2017 • Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi
There is a growing interest in learning data representations that work well for many different types of problems and data.
no code implementations • ICCV 2017 • David Novotny, Diane Larlus, Andrea Vedaldi
Traditional approaches for learning 3D object categories use either synthetic data or manual supervision.
1 code implementation • ICCV 2017 • James Thewlis, Hakan Bilen, Andrea Vedaldi
Learning automatically the structure of object categories remains an important open problem in computer vision.
Ranked #2 on Unsupervised Facial Landmark Detection on AFLW-MTFL
no code implementations • CVPR 2017 • Jack Valmadre, Luca Bertinetto, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr
The Correlation Filter is an algorithm that trains a linear template to discriminate between images and their translations.
Ranked #3 on Visual Object Tracking on OTB-50
no code implementations • CVPR 2017 • Vassileios Balntas, Karel Lenc, Andrea Vedaldi, Krystian Mikolajczyk
In this paper, we propose a novel benchmark for evaluating local image descriptors.
no code implementations • CVPR 2017 • David Novotny, Diane Larlus, Andrea Vedaldi
Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG.
6 code implementations • ICCV 2017 • Ruth Fong, Andrea Vedaldi
As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions.
1 code implementation • 7 Apr 2017 • Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning.
no code implementations • 1 Mar 2017 • Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi
Evolution has resulted in highly developed abilities in many natural intelligences to quickly and accurately predict mechanical phenomena.
no code implementations • 25 Jan 2017 • Hakan Bilen, Andrea Vedaldi
With the advent of large labelled datasets and high-capacity models, the performance of machine vision systems has been improving rapidly.
Ranked #14 on Continual Learning on visual domain decathlon (10 tasks)
1 code implementation • CVPR 2017 • Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
The recent work of Gatys et al., who characterized the style of an image by the statistics of convolutional neural network filters, ignited a renewed interest in the texture generation and image stylization problems.
3 code implementations • 2 Dec 2016 • Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi
This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos.
no code implementations • 7 Oct 2016 • Samuel Albanie, Andrea Vedaldi
As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos.
no code implementations • ICML 2017 • João F. Henriques, Andrea Vedaldi
Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent translation-invariance of natural images.
1 code implementation • 12 Sep 2016 • James Thewlis, Shuai Zheng, Philip H. S. Torr, Andrea Vedaldi
Deep Matching (DM) is a popular high-quality method for quasi-dense image matching.
22 code implementations • 27 Jul 2016 • Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
It this paper we revisit the fast stylization method introduced in Ulyanov et.
no code implementations • 5 Jul 2016 • David Novotny, Diane Larlus, Andrea Vedaldi
While recent research in image understanding has often focused on recognizing more types of objects, understanding more about the objects is just as important.
10 code implementations • 30 Jun 2016 • Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr
The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself.
Ranked #3 on Visual Object Tracking on OTB-50