It has been challenging to analyze signals with mixed topologies (for example, point cloud with surface mesh).
Several popular approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views.
Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs.
Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.
We propose a novel approach to jointly perform 3D shape retrieval and pose estimation from monocular images. In order to make the method robust to real-world image variations, e. g. complex textures and backgrounds, we learn an embedding space from 3D data that only includes the relevant information, namely the shape and pose.