Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Dynamics of human body skeletons convey significant information for human action recognition.
#2 best model for Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton
Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
#5 best model for Action Classification on Moments in Time (Top 5 Accuracy metric)
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
#3 best model for Multimodal Activity Recognition on EV-Action
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
#2 best model for Action Recognition In Videos on Sports-1M
The deep two-stream architecture exhibited excellent performance on video based action recognition.
However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.
We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.