For an autonomous vehicle it is essential to observe the ongoing dynamics of a scene and consequently predict imminent future scenarios to ensure safety to itself and others.
Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets.
Effective modeling of human interactions is of utmost importance when forecasting behaviors such as future trajectories.
To understand human behavior we must not just recognize individual actions but model possibly complex group activity and interactions.
Ranked #8 on Group Activity Recognition on Volleyball
Trajectory prediction is an important task, especially in autonomous driving.
Autonomous vehicles are expected to drive in complex scenarios with several independent non cooperating agents.
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios.
To address this issue, in this paper we propose an approach capable of generating images starting from a given text using conditional GANs trained on uncaptioned images dataset.
Autonomous driving is becoming a reality, yet vehicles still need to rely on complex sensor fusion to understand the scene they act in.
In this paper we deal with the problem of predicting action progress in videos.
Moreover we show that our approach can be used as a pre-processing step for object detection in case images are degraded by compression to a point that state-of-the art detectors fail.
In this paper we present a simple yet effective approach to extend without supervision any object proposal from static images to videos.
Automatic image annotation is among the fundamental problems in computer vision and pattern recognition, and it is becoming increasingly important in order to develop algorithms that are able to search and browse large-scale image collections.