Action Detection

233 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Libraries

Use these libraries to find Action Detection models and implementations
6 papers
3,887
2 papers
2,986
See all 6 libraries.

Most implemented papers

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

butspeechfit/eend 12 Nov 2022

End-to-end diarization presents an attractive alternative to standard cascaded diarization systems because a single system can handle all aspects of the task at once.

Temporal Action Localization with Enhanced Instant Discriminability

dingfengshi/tridetplus 11 Sep 2023

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

Single Shot Temporal Action Detection

hypjudy/Decouple-SSAD 17 Oct 2017

The main drawback of this framework is that the boundaries of action instance proposals have been fixed during the classification step.

Learning Latent Super-Events to Detect Multiple Activities in Videos

piergiaj/super-events-cvpr18 CVPR 2018

In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos.

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

SilvioGiancola/SoccerNet-code 12 Apr 2018

A total of 6, 637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution).

Temporal Recurrent Networks for Online Action Detection

xumingze0308/TRN.pytorch ICCV 2019

Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed.

Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

knmac/LCDC_release ICCV 2019

Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction.

Actor Conditioned Attention Maps for Video Action Detection

oulutan/ACAM_Demo 30 Dec 2018

While observing complex events with multiple actors, humans do not assess each actor separately, but infer from the context.

Personal VAD: Speaker-Conditioned Voice Activity Detection

pirxus/personalVAD 12 Aug 2019

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

zhoubolei/moments_models 1 Nov 2019

Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds.