Action Anticipation
34 papers with code • 6 benchmarks • 8 datasets
Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.
Latest papers with no code
Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention
We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input.
Intention Action Anticipation Model with Guide-Feedback Loop Mechanism
Based on GFL, the MultiComplete-Recent Feature Aggregation (MCRFA) module is proposed to model the relation of one recent feature with multiscale complete features.
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Compared to existing methods, MAT has the advantage of learning additional environmental context from two kinds of text inputs: action descriptions during the pre-training stage, and the text inputs for detected objects and actions during modality feature fusion.
LALM: Long-Term Action Anticipation with Language Models
Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint.
DiffAnt: Diffusion Models for Action Anticipation
However, the majority of existing action anticipation models adhere to a deterministic approach, neglecting to account for future uncertainties.
A Survey on Deep Learning Techniques for Action Anticipation
The ability to anticipate possible future human actions is essential for a wide range of applications, including autonomous driving and human-robot interaction.
JOADAA: joint online action detection and action anticipation
By combining action anticipation and online action detection, our approach can cover the missing dependencies of future information in online action detection.
Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos
This work focuses on anticipating long-term human actions, particularly using short video segments, which can speed up editing workflows through improved suggestions while fostering creativity by suggesting narratives.
Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos
Compared to existing video modeling architectures for action anticipation, NAOGAT captures the relationship between objects and the global scene context in order to predict detections for the next active object and anticipate relevant future actions given these detections, leveraging the objects' dynamics to improve accuracy.
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.