Applying our method can also improve the pose estimation average precision results of Op-Net by 6. 06% on average.
A novel object detection method is presented that handles freely rotated objects of arbitrary sizes, including tiny objects as small as $2\times 2$ pixels.
A new method is proposed for human motion predition by learning temporal and spatial dependencies in an end-to-end deep neural network.
Moreover, face recognition experiments demonstrate that our hallucinated depth along with the input RGB images boosts performance across various architectures when compared to a single RGB modality by average values of +1. 2%, +2. 6%, and +2. 6% for IIIT-D, EURECOM, and LFW datasets respectively.
We propose a novel keypoint voting scheme based on intersecting spheres, that is more accurate than existing schemes and allows for a smaller set of more disperse keypoints.
Our novel attention mechanism directs the deep network "where to look" for visual features in the RGB image by focusing the attention of the network using depth features extracted by a Convolution Neural Network (CNN).
We explore end-to-end trained differentiable models that integrate natural logic with neural networks, aiming to keep the backbone of natural language reasoning based on the natural logic formalism while introducing subsymbolic vector representations and neural components.
Commonsense reasoning simulates the human ability to make presumptions about our physical world, and it is an indispensable cornerstone in building general AI systems.
We propose the new problem of learning to recover reasoning chains from weakly supervised signals, i. e., the question-answer pairs.
A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition.