Search Results for author: Carl Doersch

Found 29 papers, 15 papers with code

Mid-level Visual Element Discovery as Discriminative Mode Seeking

no code implementations NeurIPS 2013 Carl Doersch, Abhinav Gupta, Alexei A. Efros

We also propose the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches, and compare our method against prior work on the Paris Street View dataset.

Scene Classification

Mid-level Elements for Object Detection

no code implementations27 Apr 2015 Aayush Bansal, Abhinav Shrivastava, Carl Doersch, Abhinav Gupta

Building on the success of recent discriminative mid-level elements, we propose a surprisingly simple approach for object detection which performs comparable to the current state-of-the-art approaches on PASCAL VOC comp-3 detection challenge (no external data).

Object object-detection +1

Unsupervised Visual Representation Learning by Context Prediction

3 code implementations ICCV 2015 Carl Doersch, Abhinav Gupta, Alexei A. Efros

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation.

Representation Learning

Data-dependent Initializations of Convolutional Neural Networks

2 code implementations21 Nov 2015 Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell

Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable.

Image Classification object-detection +2

Tutorial on Variational Autoencoders

27 code implementations19 Jun 2016 Carl Doersch

In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions.

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

no code implementations25 Jun 2016 Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert

We show that our method is able to successfully predict events in a wide variety of scenes and can produce multiple different predictions when the future is ambiguous.

Multi-task Self-Supervised Visual Learning

no code implementations ICCV 2017 Carl Doersch, Andrew Zisserman

We investigate methods for combining multiple self-supervised tasks--i. e., supervised tasks where data can be collected without manual labeling--in order to train a single visual representation.

Depth Estimation Depth Prediction +1

Kickstarting Deep Reinforcement Learning

no code implementations10 Mar 2018 Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami

Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance.

reinforcement-learning Reinforcement Learning (RL)

Learning Visual Question Answering by Bootstrapping Hard Attention

no code implementations ECCV 2018 Mateusz Malinowski, Carl Doersch, Adam Santoro, Peter Battaglia

Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs.

Hard Attention Question Answering +1

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

no code implementations11 Sep 2018 Mateusz Malinowski, Carl Doersch

Visual QA is a pivotal challenge for higher-level reasoning, requiring understanding language, vision, and relationships between many objects in a scene.

Question Answering Relational Reasoning

Structured agents for physical construction

no code implementations5 Apr 2019 Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick

Our results show that agents which use structured representations (e. g., objects and scene graphs) and structured policies (e. g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes.

Scene Understanding

Exploiting temporal context for 3D human pose estimation in the wild

1 code implementation CVPR 2019 Anurag Arnab, Carl Doersch, Andrew Zisserman

We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos.

 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

3D Pose Estimation Monocular 3D Human Pose Estimation

Sim2real transfer learning for 3D human pose estimation: motion to the rescue

no code implementations NeurIPS 2019 Carl Doersch, Andrew Zisserman

In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person's motion, notably as optical flow and the motion of 2D keypoints.

3D Human Pose Estimation 3D Pose Estimation +2

CrossTransformers: spatially-aware few-shot transfer

4 code implementations NeurIPS 2020 Carl Doersch, Ankush Gupta, Andrew Zisserman

In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains.

Self-Supervised Learning

Input-level Inductive Biases for 3D Reconstruction

no code implementations CVPR 2022 Wang Yifan, Carl Doersch, Relja Arandjelović, João Carreira, Andrew Zisserman

Much of the recent progress in 3D vision has been driven by the development of specialized architectures that incorporate geometrical inductive biases.

3D Reconstruction Depth Estimation

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations7 Nov 2022 Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

no code implementations30 Aug 2023 Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly.

Learning from One Continuous Video Stream

no code implementations1 Dec 2023 João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling.

Data Augmentation Future prediction

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations1 Feb 2024 Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

Cannot find the paper you are looking for? You can Submit a new open access paper.