Search Results for author: Andrea Vedaldi

Found 197 papers, 87 papers with code

Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

no code implementations1 Oct 2024 Junlin Han, Jianyuan Wang, Andrea Vedaldi, Philip Torr, Filippos Kokkinos

We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.

CatFree3D: Category-agnostic 3D Object Detection with Diffusion

no code implementations22 Aug 2024 Wenjing Bian, ZiRui Wang, Andrea Vedaldi

Image-based 3D object detection is widely employed in applications such as autonomous vehicles and robotics, yet current systems struggle with generalisation due to complex problem setup and limited training data.

3D Object Detection Autonomous Vehicles +4

3D-Aware Instance Segmentation and Tracking in Egocentric Videos

no code implementations19 Aug 2024 Yash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman

Egocentric videos present unique challenges for 3D scene understanding due to rapid camera motion, frequent object occlusions, and limited object visibility.

3D Object Reconstruction Instance Segmentation +5

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

no code implementations8 Aug 2024 Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.

Video Generation

SHIC: Shape-Image Correspondences with no Keypoint Supervision

no code implementations26 Jul 2024 Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi

The reduction works by matching images of the object to non-photorealistic renders of the template, which emulates the process of collecting manual annotations for this task.

Keypoint Detection Object

Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects

no code implementations2 Jul 2024 Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni

The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects.

4k Texture Synthesis

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

no code implementations2 Jul 2024 Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control.

3D Generation Text to 3D

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

1 code implementation6 Jun 2024 Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi

In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient.

3D Scene Reconstruction Monocular Depth Estimation +1

Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

no code implementations30 Apr 2024 Paul Engstler, Andrea Vedaldi, Iro Laina, Christian Rupprecht

These works often depend on pre-trained monocular depth estimators to lift the generated images into 3D, fusing them with the existing scene representation.

Benchmarking Depth Completion +2

Lightplane: Highly-Scalable Components for Neural 3D Fields

1 code implementation30 Apr 2024 Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny

Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision.

3D Reconstruction

DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

no code implementations29 Apr 2024 Minghao Chen, Iro Laina, Andrea Vedaldi

A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data.

3D geometry

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

no code implementations22 Mar 2024 Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

We introduce DragAPart, a method that, given an image and a set of drags as input, generates a new image of the same object that responds to the action of the drags.

N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

no code implementations16 Mar 2024 Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities.

Scene Understanding

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

1 code implementation15 Feb 2024 Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

no code implementations13 Feb 2024 Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos

A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly.

3D Generation 3D Reconstruction +1

Learning the 3D Fauna of the Web

no code implementations CVPR 2024 Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu

We show that prior category-specific attempts fail to generalize to rare species with limited training images.

GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering

no code implementations CVPR 2024 Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

1 code implementation CVPR 2024 Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.

3D Object Reconstruction 3D Reconstruction +2

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

no code implementations CVPR 2024 Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi

In particular, we hypothesise that editing can be greatly simplified by first encoding 3D objects in a suitable latent space.

Free3D: Consistent Novel View Synthesis without 3D Representation

1 code implementation CVPR 2024 Chuanxia Zheng, Andrea Vedaldi

Similar to Zero-1-to-3, we start from a pre-trained 2D image generator for generalization, and fine-tune it for NVS.

3D Reconstruction Novel View Synthesis

Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator

1 code implementation4 Dec 2023 Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, Ronald Clark

To address this, we introduce the concept of a meta-calibrator that performs uncertainty calibration for NeRFs with a single forward pass without the need for holding out any images from the target scene.

Image Reconstruction Medical Diagnosis +2

HoloFusion: Towards Photo-realistic 3D Generative Modeling

no code implementations ICCV 2023 Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, David Novotny

Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism.

3D Generation Super-Resolution

Online Clustered Codebook

1 code implementation ICCV 2023 Chuanxia Zheng, Andrea Vedaldi

Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is increasingly used in representation learning.

Representation Learning

Diffusion Models for Open-Vocabulary Segmentation

no code implementations15 Jun 2023 Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht

Open-vocabulary segmentation is the task of segmenting anything that can be named in an image.

Language Modelling Segmentation

EPIC Fields: Marrying 3D Geometry and Video Understanding

1 code implementation NeurIPS 2023 Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi

Compared to other neural rendering datasets, EPIC Fields is better tailored to video understanding because it is paired with labelled action segments and the recent VISOR segment annotations.

3D geometry Neural Rendering +1

Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

1 code implementation NeurIPS 2023 Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.

Clustering Instance Segmentation +2

DynamicStereo: Consistent Dynamic Depth from Stereo Videos

1 code implementation CVPR 2023 Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

no code implementations20 Apr 2023 Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch.

Monocular Reconstruction Object

What does CLIP know about a red circle? Visual prompt engineering for VLMs

no code implementations ICCV 2023 Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi

Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation.

Prompt Engineering Text-to-Image Generation +1

Training-Free Layout Control with Cross-Attention Guidance

1 code implementation6 Apr 2023 Minghao Chen, Iro Laina, Andrea Vedaldi

We thoroughly evaluate our approach on three benchmarks and provide several qualitative examples and a comparative analysis of the two strategies that demonstrate the superiority of backward guidance compared to forward guidance, as well as prior work.

HoloDiffusion: Training a 3D Diffusion Model using 2D Images

no code implementations CVPR 2023 Animesh Karnewar, Andrea Vedaldi, David Novotny, Niloy Mitra

We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.

Real-time volumetric rendering of dynamic humans

1 code implementation21 Mar 2023 Ignacio Rocco, Iurii Makarov, Filippos Kokkinos, David Novotny, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos with accompanying parametric body fits.

3D Reconstruction

$PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

2 code implementations21 Feb 2023 Luke Melas-Kyriazi, Christian Rupprecht, Andrea Vedaldi

Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.

3D Reconstruction Denoising

RealFusion: 360° Reconstruction of Any Object from a Single Image

3 code implementations21 Feb 2023 Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

We consider the problem of reconstructing a full 360{\deg} photographic model of an object from a single image of it.

3D Reconstruction Object

Text-To-4D Dynamic Scene Generation

no code implementations26 Jan 2023 Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.

Scene Generation

Novel-View Acoustic Synthesis

no code implementations CVPR 2023 Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?

Neural Rendering Novel View Synthesis

Self-Supervised Correspondence Estimation via Multiview Registration

1 code implementation6 Dec 2022 Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham

To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.

Diversity

MagicPony: Learning Articulated 3D Animals in the Wild

no code implementations CVPR 2023 Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi

We consider the problem of predicting the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse given a single test image as input.

Viewpoint Estimation

Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations

no code implementations7 Sep 2022 Vadim Tschernezki, Iro Laina, Diane Larlus, Andrea Vedaldi

We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene.

Neural Rendering Retrieval

End-to-End Visual Editing with a Generatively Pre-Trained Artist

no code implementations3 May 2022 Andrew Brown, Cheng-Yang Fu, Omkar Parkhi, Tamara L. Berg, Andrea Vedaldi

We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.

KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos

no code implementations CVPR 2022 David Novotny, Ignacio Rocco, Samarth Sinha, Alexandre Carlier, Gael Kerchenbaum, Roman Shapovalov, Nikita Smetanin, Natalia Neverova, Benjamin Graham, Andrea Vedaldi

Compared to weaker deformation models, this significantly reduces the reconstruction ambiguity and, for dynamic objects, allows Keypoint Transporter to obtain reconstructions of the quality superior or at least comparable to prior approaches while being much faster and reliant on a pre-trained monocular depth estimator network.

3D Reconstruction Depth Estimation +2

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

1 code implementation CVPR 2022 Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model.

3D Shape Reconstruction from Videos Dynamic Reconstruction

Audio-Visual Synchronisation in the wild

no code implementations8 Dec 2021 Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

1 code implementation5 Nov 2021 Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis.

Cross-Modal Retrieval Fine-Grained Image Recognition +2

NeuralDiff: Segmenting 3D objects that move in egocentric videos

no code implementations19 Oct 2021 Vadim Tschernezki, Diane Larlus, Andrea Vedaldi

Given a raw video sequence taken from a freely-moving camera, we study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground containing the objects that move in the video sequence.

Neural Rendering Semantic Segmentation

Open-Set Recognition: a Good Closed-Set Classifier is All You Need?

2 code implementations ICLR 2022 Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.

Open Set Learning Out-of-Distribution Detection

PASS: An ImageNet replacement for self-supervised pretraining without humans

1 code implementation NeurIPS Workshop ImageNet_PPF 2021 Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi

On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be necessary, or perhaps not even optimal, for model pretraining.

Benchmarking Ethics +2

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

1 code implementation16 Sep 2021 Robert McCraith, Eldar Insafutdinov, Lukas Neumann, Andrea Vedaldi

We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects.

DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

no code implementations ICCV 2021 Roman Shapovalov, David Novotny, Benjamin Graham, Patrick Labatut, Andrea Vedaldi

The method learns, in an end-to-end fashion, a soft partition of a given category-specific 3D template mesh into rigid parts together with a monocular reconstruction network that predicts the part motions such that they reproject correctly onto 2D DensePose-like surface annotations of the object.

3D Reconstruction Monocular Reconstruction +1

Augmenting Implicit Neural Shape Representations with Explicit Deformation Fields

no code implementations19 Aug 2021 Matan Atzmon, David Novotny, Andrea Vedaldi, Yaron Lipman

Implicit neural representation is a recent approach to learn shape collections as zero level-sets of neural networks, where each shape is represented by a latent code.

Decoder

DOVE: Learning Deformable 3D Objects by Watching Videos

no code implementations22 Jul 2021 Shangzhe Wu, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi

In this paper, we present DOVE, a method that learns textured 3D models of deformable object categories from monocular videos available online, without keypoint, viewpoint or template shape supervision.

AutoNovel: Automatically Discovering and Learning Novel Visual Categories

1 code implementation29 Jun 2021 Kai Han, Sylvestre-Alvise Rebuffi, Sébastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman

We present a new approach called AutoNovel to address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labelled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use ranking statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data.

Clustering Image Clustering +2

Pedestrian and Ego-Vehicle Trajectory Prediction From Monocular Camera

no code implementations CVPR 2021 Lukas Neumann, Andrea Vedaldi

Predicting future pedestrian trajectory is a crucial component of autonomous driving systems, as recognizing critical situations based only on current pedestrian position may come too late for any meaningful corrective action (e. g. breaking) to take place.

Autonomous Driving Pedestrian Trajectory Prediction +3

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

no code implementations CVPR 2021 Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi

We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i. e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them.

Test Sample Accuracy Scales with Training Sample Density in Neural Networks

1 code implementation15 Jun 2021 Xu Ji, Razvan Pascanu, Devon Hjelm, Balaji Lakshminarayanan, Andrea Vedaldi

Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space.

Image Classification

Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes

no code implementations5 May 2021 Dan Xu, Andrea Vedaldi, Joao F. Henriques

We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map.

3D geometry Depth Estimation +2

Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning

1 code implementation ICCV 2021 Mandela Patrick, Yuki M. Asano, Bernie Huang, Ishan Misra, Florian Metze, Joao Henriques, Andrea Vedaldi

First, for space, we show that spatial augmentations such as cropping do work well for videos too, but that previous implementations, due to the high processing and memory cost, could not do this at a scale sufficient for it to work well.

Representation Learning Self-Supervised Learning

Continuous Surface Embeddings

1 code implementation NeurIPS 2020 Natalia Neverova, David Novotny, Vasil Khalidov, Marc Szafraniec, Patrick Labatut, Andrea Vedaldi

In this work, we focus on the task of learning and representing dense correspondences in deformable object categories.

Object Pose Estimation

Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning

no code implementations NeurIPS 2020 Iro Laina, Ruth C. Fong, Andrea Vedaldi

The increasing impact of black box models, and particularly of unsupervised ones, comes with an increasing interest in tools to understand and interpret them.

Clustering Representation Learning

Support-set bottlenecks for video-text representation learning

no code implementations ICLR 2021 Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi

The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.

Contrastive Learning Representation Learning +3

Multi-modal Self-Supervision from Generalized Data Transformations

no code implementations28 Sep 2020 Mandela Patrick, Yuki Asano, Polina Kuznetsova, Ruth Fong, Joao F. Henriques, Geoffrey Zweig, Andrea Vedaldi

In this paper, we show that, for videos, the answer is more complex, and that better results can be obtained by accounting for the interplay between invariance, distinctiveness, multiple modalities and time.

Audio Classification Retrieval +1

Calibrating Self-supervised Monocular Depth Estimation

no code implementations16 Sep 2020 Robert McCraith, Lukas Neumann, Andrea Vedaldi

In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.

Monocular Depth Estimation

Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

1 code implementation NeurIPS 2020 David Novotny, Roman Shapovalov, Andrea Vedaldi

We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.

3D Reconstruction Object

Automatic Recall Machines: Internal Replay, Continual Learning and the Brain

1 code implementation22 Jun 2020 Xu Ji, Joao Henriques, Tinne Tuytelaars, Andrea Vedaldi

Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.

Continual Learning

VGGSound: A Large-scale Audio-Visual Dataset

3 code implementations29 Apr 2020 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.

Image Classification

Monocular Depth Estimation with Self-supervised Instance Adaptation

no code implementations13 Apr 2020 Robert McCraith, Lukas Neumann, Andrew Zisserman, Andrea Vedaldi

Recent advances in self-supervised learning havedemonstrated that it is possible to learn accurate monoculardepth reconstruction from raw video data, without using any 3Dground truth for supervision.

Monocular Depth Estimation Monocular Reconstruction +1

Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

1 code implementation7 Apr 2020 Hanbyul Joo, Natalia Neverova, Andrea Vedaldi

Remarkably, the resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks such as 3DPW.

Ranked #23 on 3D Human Pose Estimation on MPI-INF-3DHP (PA-MPJPE metric)

3D Human Pose Estimation 3D Pose Estimation

There and Back Again: Revisiting Backpropagation Saliency Methods

1 code implementation CVPR 2020 Sylvestre-Alvise Rebuffi, Ruth Fong, Xu Ji, Andrea Vedaldi

Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.

Meta-Learning

Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

1 code implementation19 Mar 2020 Oliver Groth, Chia-Man Hung, Andrea Vedaldi, Ingmar Posner

Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick-and-place from raw images.

Imitation Learning Meta-Learning +1

Fixing the train-test resolution discrepancy: FixEfficientNet

1 code implementation18 Mar 2020 Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88. 5% top-1 accuracy (top-5: 98. 7%), which establishes the new state of the art for ImageNet with a single crop.

Ranked #9 on Image Classification on ImageNet ReaL (using extra training data)

Data Augmentation Image Classification

Transferring Dense Pose to Proximal Animal Classes

1 code implementation CVPR 2020 Artsiom Sanakoyeu, Vasil Khalidov, Maureen S. McCarthy, Andrea Vedaldi, Natalia Neverova

Recent contributions have demonstrated that it is possible to recognize the pose of humans densely and accurately given a large dataset of poses annotated in detail.

Transfer Learning

Automatically Discovering and Learning New Visual Categories with Ranking Statistics

1 code implementation ICLR 2020 Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman

In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data.

Clustering General Classification +1

Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels

no code implementations NeurIPS 2019 Natalia Neverova, David Novotny, Andrea Vedaldi

We show that these models, by understanding uncertainty better, can solve the original DensePose task more accurately, thus setting the new state-of-the-art accuracy in this benchmark.

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

1 code implementation CVPR 2020 Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision.

Object

Self-labelling via simultaneous clustering and representation learning

5 code implementations ICLR 2020 Yuki Markus Asano, Christian Rupprecht, Andrea Vedaldi

Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks.

Clustering Contrastive Learning +4

Occlusions for Effective Data Augmentation in Image Classification

no code implementations23 Oct 2019 Ruth Fong, Andrea Vedaldi

Deep networks for visual recognition are known to leverage "easy to recognise" portions of objects such as faces and distinctive texture patterns.

Classification Data Augmentation +2

NormGrad: Finding the Pixels that Matter for Training

no code implementations19 Oct 2019 Sylvestre-Alvise Rebuffi, Ruth Fong, Xu Ji, Hakan Bilen, Andrea Vedaldi

In this paper, we are rather interested by the locations of an image that contribute to the model's training.

Meta-Learning

Understanding Deep Networks via Extremal Perturbations and Smooth Masks

2 code implementations ICCV 2019 Ruth Fong, Mandela Patrick, Andrea Vedaldi

In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable.

Interpretable Machine Learning

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

2 code implementations ICCV 2019 David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images.

Unsupervised Learning of Landmarks by Descriptor Vector Exchange

1 code implementation ICCV 2019 James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi

Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision.

Object Unsupervised Facial Landmark Detection

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations

no code implementations14 Aug 2019 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.

Object

Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

no code implementations CVPR 2020 Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses.

Facial Landmark Detection Image-to-Image Translation +4

Fixing the train-test resolution discrepancy

3 code implementations NeurIPS 2019 Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86. 4% (top-5: 98. 0%) (single-crop).

Ranked #2 on Fine-Grained Image Classification on Birdsnap (using extra training data)

Data Augmentation Fine-Grained Image Classification +1

Photo-Geometric Autoencoding to Learn 3D Objects from Unlabelled Images

no code implementations4 Jun 2019 Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

Specifically, given a single image of the object seen from an arbitrary viewpoint, our model predicts a symmetric canonical view, the corresponding 3D shape and a viewpoint transformation, and trains with the goal of reconstructing the input view, resembling an auto-encoder.

Unsupervised Intuitive Physics from Past Experiences

no code implementations26 May 2019 Sébastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi

We are interested in learning models of intuitive physics similar to the ones that animals use for navigation, manipulation and planning.

Meta-Learning

Semi-Supervised Learning with Scarce Annotations

1 code implementation21 May 2019 Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Kai Han, Andrea Vedaldi, Andrew Zisserman

The first is a simple but effective one: we leverage the power of transfer learning among different tasks and self-supervision to initialize a good representation of the data without making use of any label.

Multi-class Classification Self-Supervised Learning +1

Guiding Physical Intuition with Neural Stethoscopes

no code implementations ICLR 2019 Fabian Fuchs, Oliver Groth, Adam Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner

Using an adversarial stethoscope, the network is successfully de-biased, leading to a performance increase from 66% to 88%.

Physical Intuition

Modelling and unsupervised learning of symmetric deformable object categories

no code implementations NeurIPS 2018 James Thewlis, Hakan Bilen, Andrea Vedaldi

We propose a new approach to model and learn, without manual supervision, the symmetries of natural objects, such as faces or flowers, given only images as input.

Object

Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

9 code implementations NeurIPS 2018 Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi

We also propose a parametric gather-excite operator pair which yields further performance gains, relate it to the recently-introduced Squeeze-and-Excitation Networks, and analyse the effects of these changes to the CNN feature activation statistics.

Supervising the new with the old: learning SFM from SFM

no code implementations ECCV 2018 Maria Klodt, Andrea Vedaldi

First, since such self-supervised approaches are based on the brightness constancy assumption, which is valid only for a subset of pixels, we propose a probabilistic learning formulation where the network predicts distributions over variables rather than specific values.

Motion Estimation valid

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

no code implementations16 Aug 2018 Samuel Albanie, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

We make the following contributions: (i) we develop a strong teacher network for facial emotion recognition that achieves the state of the art on a standard benchmark; (ii) we use the teacher to train a student, tabula rasa, to learn representations (embeddings) for speech emotion recognition without access to labelled audio data; and (iii) we show that the speech emotion embedding can be used for speech emotion recognition on external benchmark datasets.

Facial Emotion Recognition Facial Expression Recognition (FER) +1

Inductive Visual Localisation: Factorised Training for Superior Generalisation

no code implementations21 Jul 2018 Ankush Gupta, Andrea Vedaldi, Andrew Zisserman

End-to-end trained Recurrent Neural Networks (RNNs) have been successfully applied to numerous problems that require processing sequences, such as image captioning, machine translation, and text recognition.

Image Captioning Machine Translation +2

Large scale evaluation of local image feature detectors on homography datasets

1 code implementation20 Jul 2018 Karel Lenc, Andrea Vedaldi

The new protocol is better for assessment on a large number of images and reduces the dependency of the results on unwanted distractors such as the number of detected features and the feature magnification factor.

Invariant Information Clustering for Unsupervised Image Classification and Segmentation

6 code implementations ICCV 2019 Xu Ji, João F. Henriques, Andrea Vedaldi

The method is not specialised to computer vision and operates on any paired dataset samples; in our experiments we use random transforms to obtain a pair from each image.

Clustering General Classification +4

Cross Pixel Optical Flow Similarity for Self-Supervised Learning

no code implementations15 Jul 2018 Aravindh Mahendran, James Thewlis, Andrea Vedaldi

We propose a novel method for learning convolutional neural image representations without manual supervision.

Image Classification Image Segmentation +4

Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes

no code implementations14 Jun 2018 Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, Ingmar Posner

Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset.

MapNet: An Allocentric Spatial Memory for Mapping Environments

no code implementations CVPR 2018 João F. Henriques, Andrea Vedaldi

The module contains an allocentric spatial memory that can be accessed associatively by feeding to it the current sensory input, resulting in localization, and then updated using an LSTM or similar mechanism.

Meta-learning with differentiable closed-form solvers

5 code implementations ICLR 2019 Luca Bertinetto, João F. Henriques, Philip H. S. Torr, Andrea Vedaldi

The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.

BIG-bench Machine Learning Few-Shot Learning +1

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

6 code implementations ICLR 2019 João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration.

PyTorch CurveBall - A second-order optimizer for deep networks

1 code implementation21 May 2018 João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

We propose a fast second-order method that can be used as a drop-in replacementfor current deep learning solvers.

Unsupervised Intuitive Physics from Visual Observations

no code implementations14 May 2018 Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi

While learning models of intuitive physics is an increasingly active area of research, current approaches still fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test times.

Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks

1 code implementation CVPR 2018 Ruth Fong, Andrea Vedaldi

By studying such embeddings, we are able to show that 1., in most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help encode multiple concepts, and that 3., compared to single filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship to other concepts.

Taking Visual Motion Prediction To New Heightfields

no code implementations22 Dec 2017 Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi

In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted the relevant states, and then used neural networks to learn the state transitions using simulation runs as training data.

motion prediction

Deep Image Prior

14 code implementations CVPR 2018 Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning.

Feature Upsampling Image Denoising +5

DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

no code implementations26 Nov 2017 Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi

Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0. 82%.

Unsupervised learning of object frames by dense equivariant image labelling

no code implementations NeurIPS 2017 James Thewlis, Hakan Bilen, Andrea Vedaldi

One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations.

Object Optical Flow Estimation +1

Learning to Represent Mechanics via Long-term Extrapolation and Interpolation

no code implementations6 Jun 2017 Sébastien Ehrhardt, Aron Monszpart, Andrea Vedaldi, Niloy Mitra

While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and associated parameters.

Learning 3D Object Categories by Looking Around Them

no code implementations ICCV 2017 David Novotny, Diane Larlus, Andrea Vedaldi

Traditional approaches for learning 3D object categories use either synthetic data or manual supervision.

Data Augmentation Object

AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

no code implementations CVPR 2017 David Novotny, Diane Larlus, Andrea Vedaldi

Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG.

Object

Interpretable Explanations of Black Boxes by Meaningful Perturbation

6 code implementations ICCV 2017 Ruth Fong, Andrea Vedaldi

As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions.

Interpretable Machine Learning

It Takes (Only) Two: Adversarial Generator-Encoder Networks

1 code implementation7 Apr 2017 Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning.

Vocal Bursts Valence Prediction

Learning A Physical Long-term Predictor

no code implementations1 Mar 2017 Sebastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi

Evolution has resulted in highly developed abilities in many natural intelligences to quickly and accurately predict mechanical phenomena.

Universal representations:The missing link between faces, text, planktons, and cat breeds

no code implementations25 Jan 2017 Hakan Bilen, Andrea Vedaldi

With the advent of large labelled datasets and high-capacity models, the performance of machine vision systems has been improving rapidly.

Continual Learning

Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

1 code implementation CVPR 2017 Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

The recent work of Gatys et al., who characterized the style of an image by the statistics of convolutional neural network filters, ignited a renewed interest in the texture generation and image stylization problems.

Diversity Image Generation +2

Action Recognition with Dynamic Image Networks

3 code implementations2 Dec 2016 Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi

This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos.

Action Recognition Optical Flow Estimation +1

Learning Grimaces by Watching TV

no code implementations7 Oct 2016 Samuel Albanie, Andrea Vedaldi

As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos.

Emotion Recognition Face Verification +2

Warped Convolutions: Efficient Invariance to Spatial Transformations

no code implementations ICML 2017 João F. Henriques, Andrea Vedaldi

Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent translation-invariance of natural images.

Translation

Fully-Trainable Deep Matching

1 code implementation12 Sep 2016 James Thewlis, Shuai Zheng, Philip H. S. Torr, Andrea Vedaldi

Deep Matching (DM) is a popular high-quality method for quasi-dense image matching.

Image Segmentation Semantic Segmentation

Learning the semantic structure of objects from Web supervision

no code implementations5 Jul 2016 David Novotny, Diane Larlus, Andrea Vedaldi

While recent research in image understanding has often focused on recognizing more types of objects, understanding more about the objects is just as important.

Navigate

Fully-Convolutional Siamese Networks for Object Tracking

10 code implementations30 Jun 2016 Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr

The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself.

Object object-detection +2