Our results confirm the problems of the previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario.
The problem of Online Human Behaviour Recognition in untrimmed videos, aka Online Action Detection (OAD), needs to be revisited.
Face recognition has achieved unprecedented results, surpassing human capabilities in certain scenarios.
Over the past few years, Presentation Attack Detection (PAD) has become a fundamental part of facial recognition systems.
Early action proposal consists in generating high quality candidate temporal segments that are likely to contain an action in a video stream, as soon as they happen.
While there has been significant progress in solving the problems of image pixel labeling, object detection and scene classification, existing approaches normally address them separately.
Standard short-cut connections are connections between layers in deep neural networks which skip at least one intermediate layer.
Training a Convolutional Neural Network (CNN) for semantic segmentation typically requires to collect a large amount of accurate pixel-level annotations, a hard and expensive task.
Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community.
We here propose two Siamese architectures for Convolutional Neural Networks, and their corresponding novel loss functions, to learn from unlabeled videos, which jointly exploit the local temporal coherence between contiguous frames, and a global discriminative margin used to separate representations of different videos.
A visual-relational knowledge graph (KG) is a multi-relational graph whose entities are associated with images.