Neural Conversational QA tasks such as ShARC require systems to answer questions based on the contents of a given passage.
The resulting algorithm is referred to as AutoFocus and results in a 2. 5-5 times speed-up during inference when used with SNIP.
We propose a functional view of matrix decomposition problems on graphs such as geometric matrix completion and graph regularized dimensionality reduction.
Recent literature has shown that features obtained from supervised training of CNNs may over-emphasize texture rather than encoding high-level information.
Ranked #28 on Object Detection on PASCAL VOC 2007
With the advancement in drone technology, in just a few years, drones will be assisting humans in every domain.
The widely adopted sequential variant of Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines.
We present the second edition of OpenEDS dataset, OpenEDS2020, a novel dataset of eye-image sequences captured at a frame rate of 100 Hz under controlled illumination, using a virtual-reality head-mounted display mounted with two synchronized eye-facing cameras.
Therefore, we propose a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers.
Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions.
Multi-person 3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data.
Ranked #3 on 3D Multi-Person Pose Estimation on MuPoTS-3D
On studying recent state-of-the-art models on the ShARCQA task, we found indications that the models learn spurious clues/patterns in the dataset.
This paper provides a comprehensive and exhaustive study of adversarial attacks on human pose estimation models and the evaluation of their robustness.
The best performing methods for 3D human pose estimation from monocular images require large amounts of in-the-wild 2D and controlled 3D pose annotated datasets which are costly and require sophisticated systems to acquire.
We describe a chemical robotic assistant equipped with a curiosity algorithm (CA) that can efficiently explore the state a complex chemical system can exhibit.
Our main observation is that high quality maps can be obtained even if the input correspondences are noisy or are encoded by a small number of coefficients in a spectral basis.
Monocular 3D human-pose estimation from static images is a challenging problem, due to the curse of dimensionality and the ill-posed nature of lifting 2D-to-3D.
This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner.
The ability to anticipate the future is essential when making real time critical decisions, provides valuable information to understand dynamic natural scenes, and can help unsupervised video representation learning.
We present a comprehensive analysis of 50 interestingness measures and classify them in accordance with the two properties.
Our approach is a modification of the R-FCN architecture in which position-sensitive filters are shared across different object classes for performing localization.
3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data.
Ranked #9 on Monocular 3D Human Pose Estimation on Human3.6M
This paper presents a novel framework in which image cosegmentation and colocalization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner.
We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function.
We show on a modified MNIST dataset that when faced with scale variation, building in scale-invariance allows ConvNets to learn more discriminative features with reduced chances of over-fitting.