On the other hand, the performance of a model in action recognition is heavily affected by domain shift.
This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years.
To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to Variational STructured Attention networks (VISTA-Net).
Training machine learning models in a meaningful order, from the easy samples to the hard ones, using curriculum learning can provide performance improvements over the standard training approach based on random data shuffling, without any additional computational costs.
State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks.
Deep learning revolution happened thanks to the availability of a massive amount of labelled data which have contributed to the development of models with extraordinary inference capabilities.
To alleviate this problem, researchers proposed various domain adaptation methods to improve object detection results in the cross-domain setting, e. g. by translating images with ground-truth labels from the source domain to the target domain using Cycle-GAN.
In the dataset, a massive annotation has been carried out, focusing on the spectators at different levels of details: at a higher level, people have been labeled depending on the team they are supporting and the fact that they know the people close to them; going to the lower levels, standard pose information has been considered (regarding the head, the body) but also fine grained actions such as hands on hips, clapping hands etc.