Traditional approaches to 3D reconstruction rely on an intermediate representation of depth maps prior to estimating a full 3D model of a scene.
Ranked #1 on 3D Reconstruction on ScanNet
Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems.
Ranked #1 on Depth Estimation on ScanNetV2
With the emergence of Virtual and Mixed Reality (XR) devices, eye tracking has received significant attention in the computer vision community.
We introduce Scan2Plan, a novel approach for accurate estimation of a floorplan from a 3D scan of the structural elements of indoor environments.
This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points.
Ranked #2 on Visual Place Recognition on Berlin Kudamm
2D Key-point estimation is an important precursor to 3D pose estimation problems for human body and hands.
Eye gaze estimation and simultaneous semantic understanding of a user through eye images is a crucial component in Virtual and Mixed Reality; enabling energy efficient rendering, multi-focal displays and effective interaction with 3D content.
We present DeepPerimeter, a deep learning based pipeline for inferring a full indoor perimeter (i. e. exterior boundary map) from a sequence of posed RGB images.
We propose a self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry (VO) frontend, a network which computes pointwise data associations across images.
We demonstrate gradient adversarial training for three different scenarios: (1) as a defense to adversarial examples we classify gradient tensors and tune them to be agnostic to the class of their corresponding example, (2) for knowledge distillation, we do binary classification of gradient tensors derived from the student or teacher network and tune the student gradient tensor to mimic the teacher's gradient tensor; and (3) for multi-task learning we classify the gradient tensors derived from different task loss functions and tune them to be statistically indistinguishable.
We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels.
This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision.
Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly.
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects).
We present a deep convolutional neural network for estimating the relative homography between a pair of images.
Ranked #3 on Homography Estimation on S-COCO
When we add our proposed global feature, and a technique for learning normalization parameters, accuracy increases consistently even over our improved versions of the baselines.
Ranked #41 on Semantic Segmentation on PASCAL Context
We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task.
On MNIST handwritten digits, we show that our model is robust to label corruption.
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.