Action Anticipation

34 papers with code • 6 benchmarks • 8 datasets

Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.

Most implemented papers

Video Representation Learning with Visual Tempo Consistency

decisionforce/VTHCL 28 Jun 2020

Visual tempo, which describes how fast an action goes, has shown its potential in supervised action recognition.

Higher Order Recurrent Space-Time Transformer for Video Action Prediction

CorcovadoMing/HORST 17 Apr 2021

Endowing visual agents with predictive capability is a key step towards video intelligence at scale.

Anticipative Video Transformer

facebookresearch/AVT ICCV 2021

We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions.

Technical Report: Temporal Aggregate Representations

dibschat/tempAgg 6 Jun 2021

At what temporal scale should they be derived?

A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic Accidents

monjurulkarim/DSTA 18 Jun 2021

Visual cues for predicting a future accident are embedded deeply in dashcam video data.

TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021

guxiao0822/trans_action 28 Jul 2021

In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given.

Weakly-Supervised Dense Action Anticipation

zhanghaotong1/wslvideodenseanticipation 15 Nov 2021

We present a (semi-) weakly supervised method using only a small number of fully-labelled sequences and predominantly sequences in which only the (one) upcoming action is labelled.

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

facebookresearch/memvit CVPR 2022

Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

assembly101/assembly101.github.io CVPR 2022

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.

Unified Recurrence Modeling for Video Action Anticipation

corcovadoming/mpnnel 2 Jun 2022

To this end, we propose a unified recurrence modeling for video action anticipation via message passing framework.