Action Detection

233 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Detection

Dataset	Best Model	Compare
J-HMDB	HIT	See all
Charades	TTM	See all
UCF101-24	STAR/L	See all
Multi-THUMOS	MLAD	See all
UCF Sports	T-CNN	See all
THUMOS' 14	MAT (Ours) Trans	See all
TSU	PDAN	See all
TTStroke-21 ME22	STCNN-V2 (Vote decision)	See all
TTStroke-21 ME21	STCNN	See all
MultiSports	HIT	See all
MultiTHUMOS	PAT	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Detection models and implementations

open-mmlab/mmaction2

6 papers

3,888

alibaba-damo-academy/FunASR

3 papers

3,284

Frostinassiky/gtad

3 papers

216

towhee-io/towhee

2 papers

2,987

See all 6 libraries.

Datasets

Subtasks

Audio-Visual Active Speaker Detection

Fine-Grained Action Detection

Action Triplet Detection

Few Shot Temporal Action Localization

Multiple Action Detection

Latest papers

Most implemented Social Latest No code

TIM: A Time Interval Machine for Audio-Visual Action Recognition

faceonlive/ai-research • 8 Apr 2024

We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events.

152

08 Apr 2024

Paper
Code

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

yingsen1/unimd • • 7 Apr 2024

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.

07 Apr 2024

Paper
Code

Online speaker diarization of meetings guided by speech separation

egruttadauria98/sspavaldo • • 30 Jan 2024

The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).

30 Jan 2024

Paper
Code

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

byz0e/glance-focus • • NeurIPS 2023

Instead of that, we train an Encoder-Decoder to generate a set of dynamic event memories at the glancing stage.

03 Jan 2024

Paper
Code

Generative Model-based Feature Knowledge Distillation for Action Recognition

aaai-24/generative-based-kd • • 14 Dec 2023

Addressing this gap, our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model.

14 Dec 2023

Paper
Code

Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression

dystopians/cfoscraft • • 13 Dec 2023

This research contributes to the development of more efficient and automated image segmentation methods, advancing the understanding of neural function in neuroscience research.

13 Dec 2023

Paper
Code

Semi-supervised Active Learning for Video Action Detection

akash2907/semi-sup-active-learning • • 12 Dec 2023

First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21.

12 Dec 2023

Paper
Code

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

sming256/OpenTAD • • 28 Nov 2023

In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1, 536 frames, leading to significant detection performance.

28 Nov 2023

Paper
Code

Centre Stage: Centricity-based Audio-Visual Temporal Action Detection

hanielwang/audio-visual-tad • • 28 Nov 2023

Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality.

28 Nov 2023

Paper
Code

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

shirleymaxx/chimpact • • NeurIPS 2023

ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.

25 Oct 2023

Paper
Code

Action Detection

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result