|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
While observing complex events with multiple actors, humans do not assess each actor separately, but infer from the context.
Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.
We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos.
SOTA for Action Detection on THUMOS' 14 (using extra training data)
Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).
Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream.
A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video.
A total of 6, 637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution).
We propose an Efficient Activity Detection System, Argus, for Extended Video Analysis in the surveillance scenario.