Action Anticipation
34 papers with code • 6 benchmarks • 8 datasets
Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.
Latest papers
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos.
Object-centric Video Representation for Long-term Action Anticipation
To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.
Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023
We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models.
Action Anticipation with Goal Consistency
In this paper, we address the problem of short-term action anticipation, i. e., we want to predict an upcoming action one second before it happens.
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention
To this end, we propose a novel approach that applies a guided attention mechanism between the objects, and the spatiotemporal features extracted from video clips, enhancing the motion and contextual information, and further decoding the object-centric and motion-centric information to address the problem of STA in egocentric videos.
Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos
Object affordance is an important concept in hand-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.
Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation
Although human action anticipation is a task which is inherently multi-modal, state-of-the-art methods on well known action anticipation datasets leverage this data by applying ensemble methods and averaging scores of unimodal anticipation networks.
Rethinking Learning Approaches for Long-Term Action Anticipation
Action anticipation involves predicting future actions having observed the initial portion of a video.
Text-Derived Knowledge Helps Vision: A Simple Cross-modal Distillation for Video-based Action Anticipation
Anticipating future actions in a video is useful for many autonomous and assistive technologies.
Learning State-Aware Visual Representations from Audible Interactions
However, learning representations from videos can be challenging.