Action Anticipation
34 papers with code • 6 benchmarks • 8 datasets
Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.
Most implemented papers
Video Representation Learning with Visual Tempo Consistency
Visual tempo, which describes how fast an action goes, has shown its potential in supervised action recognition.
Higher Order Recurrent Space-Time Transformer for Video Action Prediction
Endowing visual agents with predictive capability is a key step towards video intelligence at scale.
Anticipative Video Transformer
We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions.
Technical Report: Temporal Aggregate Representations
At what temporal scale should they be derived?
A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic Accidents
Visual cues for predicting a future accident are embedded deeply in dashcam video data.
TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021
In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given.
Weakly-Supervised Dense Action Anticipation
We present a (semi-) weakly supervised method using only a small number of fully-labelled sequences and predominantly sequences in which only the (one) upcoming action is labelled.
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.
Unified Recurrence Modeling for Video Action Anticipation
To this end, we propose a unified recurrence modeling for video action anticipation via message passing framework.