Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs.
Several approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views.
Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.
We propose a novel approach to jointly perform 3D shape retrieval and pose estimation from monocular images. In order to make the method robust to real-world image variations, e. g. complex textures and backgrounds, we learn an embedding space from 3D data that only includes the relevant information, namely the shape and pose.
In this paper, we introduce a new 3D hand gesture recognition approach based on a deep learning model.