This technical report present an overview of our system proposed for the spatio-temporal action localization(SAL) task in ActivityNet Challenge 2019.
This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.
To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by using the training samples around temporal boundaries.
Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models.
Enabling computational systems with the ability to localize actions in video-based content has manifold applications.
A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.
#4 best model for Action Recognition In Videos on AVA v2.1
Rather than disconnecting the spatio-temporal learning from the training, we propose Spatio-Temporal Instance Learning, which enables action localization directly from box proposals in video frames.
This notebook paper presents an overview and comparative analysis of our systems designed for the following five tasks in ActivityNet Challenge 2018: temporal action proposals, temporal action localization, dense-captioning events in videos, trimmed action recognition, and spatio-temporal action localization.
In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks.