For an historian, the first step in studying their evolution in a corpus of similar manuscripts is to identify which ones correspond to each other.
We propose a methodology for evaluation and explore the influence of three aspects of deep MVS methods: network architecture, training data, and supervision.
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image.
We present docExtractor, a generic approach for extracting visual elements such as text lines or illustrations from historical documents without requiring any real data annotation.
Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images in order to jointly estimate camera viewpoints and 6D poses of all objects in a single consistent scene.
In this paper, we systematically study the effect of variations in the training data by evaluating deep features trained on different image sets in a few-shot classification setting.
In contrast, we present an orthogonal approach that does not rely on abstract features but instead learns to predict image transformations and performs clustering directly in image space.
Ranked #2 on Image Clustering on MNIST-test
Historical watermark recognition is a highly practical, yet unsolved challenge for archivists and historians.
We propose to represent shapes as the deformation and combination of learnable elementary 3D structures, which are primitives resulting from training over a collection of shape.
Most deep pose estimation methods need to be trained for specific object instances or categories.
We address the problem of visually guided rearrangement planning with many movable objects, i. e., finding a sequence of actions to move a set of objects from an initial arrangement to a desired one, while relying on visual inputs coming from an RGB camera.
Our goal in this paper is to discover near duplicate patterns in large collections of artworks.
Localizing an object accurately with respect to a robot is a key step for autonomous robotic manipulation.
By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template.
Ranked #1 on 3D Point Cloud Matching on Faust (using extra training data)
We introduce a method for learning to generate the surface of 3D shapes.
Ranked #2 on 3D Shape Reconstruction on Pix3D
The main strengths of our approach are its robustness to freehand bitmap drawings, its ability to adapt to different object categories, and the continuum it offers between single-view and multi-view sketch-based modeling.
Convolutional Neural Networks (CNNs) were recently shown to provide state-of-the-art results for object category viewpoint estimation.
We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth.
In this paper we study the application of convolutional neural networks for jointly detecting objects depicted in still images and estimating their 3D pose.
This paper poses object category detection in images as a type of 2D-to-3D alignment problem, utilizing the large quantities of 3D CAD models that have been made publicly available online.