As a result, outlier detection is a fundamental problem in computer vision and a wide range of approaches, from simple checks based on descriptor similarity to geometric verification, have been proposed over the last decades.
In this paper we study the problem of estimating the semi-generalized pose of a partially calibrated camera, i. e., the pose of a perspective camera with unknown focal length w. r. t.
AR/VR applications and robots need to know when the scene has changed.
In this work, we thus explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Motivated by recent advances in the area of monocular geometry prediction, we systematically explore the utility these cues provide for improving neural implicit surface reconstruction.
In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms.
In this paper, we present HOPS, the first method to capture interactions such as dragging objects and opening doors from ego-centric data alone.
In this paper, we propose a new open-source benchmarking framework for Visual Geo-localization (VG) that allows to build, train, and test a wide range of commonly used architectures, with the flexibility to change individual components of a geo-localization pipeline.
We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
Visual localization is the problem of estimating the position and orientation from which a given image (or a sequence of images) is taken in a known scene.
We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment using wearable sensors.
1 code implementation • • Paul-Edouard Sarlin, Ajaykumar Unagar, Måns Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, Torsten Sattler
In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms.
In this paper, we propose the first minimal solutions for estimating the semi-generalized homography given a perspective and a generalized camera.
To address the resulting potential privacy risks for user-generated content, it was recently proposed to lift point clouds to line clouds by replacing 3D points by randomly oriented 3D lines passing through these points.
In this work, we propose a new perspective to estimate correspondences in a detect-to-refine manner, where we first predict patch-level match proposals and then refine them.
This paper focuses on understanding the role of image retrieval for multiple visual localization tasks.
We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence.
Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines.
In this paper, we adapt 3RScan - a recently introduced indoor RGB-D dataset designed for object instance re-localization - to create RIO10, a new long-term camera re-localization benchmark focused on indoor scenes.
In particular, our approach is more robust than the naive approach of first estimating intrinsic parameters and pose per camera before refining the extrinsic parameters of the system.
The main advantage of such solvers is that their sample size is smaller, e. g., only two instead of four matches are required to estimate a homography.
Local feature matching is a critical component of many computer vision pipelines, including among others Structure-from-Motion, SLAM, and Visual Localization.
Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality.
Motion blurry images challenge many computer vision algorithms, e. g, feature detection, motion estimation, or object recognition.
In contrast, generic camera models allow for very accurate calibration due to their flexibility.
In this paper, we propose a new neural network, the Fine-Grained Segmentation Network (FGSN), that can be used to provide image segmentations with a larger number of labels and can be trained in a self-supervised fashion.
The pose with the largest geometric consistency with the query image, e. g., in the form of an inlier count, is then selected in a second stage.
Using a classical feature-based approach within this framework, we show state-of-the-art performance.
Our approach spans from offline model building to real-time client-side pose fusion.
In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions.
Ranked #10 on Image Matching on IMC PhotoTourism
We furthermore use our model to show that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure.
We show that adding the correspondences as extra supervision during training improves the segmentation performance of the convolutional neural network, making it more robust to seasonal changes and weather conditions.
We propose instead to tightly couple mesh regularization and state estimation by detecting and enforcing structural regularities in a novel factor-graph formulation.
Through quantitative and perceptual experiments, we show that our model outperforms previous work and that our dataset is a valuable benchmark for generative models.
This paper addresses the challenge of dense pixel correspondence estimation between two images.
Ranked #2 on Dense Pixel Correspondence Estimation on HPatches
We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image.
This results in a system that provides reliable and drift-less pose estimations for high speed autonomous driving.
Robust and accurate visual localization across large appearance variations due to changes in time of day, seasons, or changes of the environment is a challenging problem which is of importance to application areas such as navigation of autonomous robots.
Besides outperforming previous compression techniques in terms of pose accuracy under the same memory constraints, our compression scheme itself is also more efficient.
We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map.
To minimize the number of cameras needed for surround perception, we utilize fisheye cameras.
2 code implementations • • Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, Fredrik Kahl, Tomas Pajdla
Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds.
Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task.
In terms of matching performance, we evaluate the different descriptors regarding standard criteria. However, considering matching performance in isolation only provides an incomplete measure of a descriptor’s quality.
Adding the knowledge of direction of triangulation, we are able to approximate the position of the camera from two matches alone.
3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately.
We present a method to jointly refine the geometry and semantic segmentation of 3D surface meshes.
In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes.
In this paper, we ask a fundamental question: can we learn such detectors from scratch?
Visual location recognition is the task of determining the place depicted in a query image from a given database of geo-tagged images.
As a second step, we obtain the calibration by finding the translation of the camera center using an ordering constraint.
An important variant of this problem is the case in which individual sides of a building can be reconstructed but not joined due to the missing visual overlap.