Our model is efficient, as it proposes a separable spatio-temporal mechanism for video attention, while being able to identify important parts of the video both spatially and temporally.
This commonly encountered operational context calls for principled techniques for training ML models with the option to abstain from predicting when uncertain.
Data imbalance, in which a plurality of the data samples come from a small proportion of labels, poses a challenge in training deep neural networks.
Parameter sharing approaches for deep multi-task learning share a common intuition: for a single network to perform multiple prediction tasks, the network needs to support multiple specialized execution paths.
In contrast to single-task learning, in which a separate model is trained for each target, multi-task learning (MTL) optimizes a single model to predict multiple related targets simultaneously.
In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks.
We propose Discriminative Prototype DTW (DP-DTW), a novel method to learn class-specific discriminative prototypes for temporal recognition tasks.
Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network.
Deep neural network compression has the potential to bring modern resource-hungry deep networks to resource-limited devices.
This allows us to take advantage of the complementary nature of pruning and quantization and to recover from premature pruning errors, which is not possible with current two-stage approaches.
Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure and loop closure detection.
Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks.
Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure, and loop closure detection.
When approaching a novel visual recognition problem in a specialized image domain, a common strategy is to start with a pre-trained deep neural network and fine-tune it to the specialized domain.
Activity analysis in which multiple people interact across a large space is challenging due to the interplay of individual actions and collective group dynamics.