Temporal Action Localization

422 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Action Localization

Dataset	Best Model	Compare
THUMOS’14	AdaTAD (VideoMAEv2-giant)	See all
ActivityNet-1.3	ActionMamba (InternVideo2-6B)	See all
HACS	ActionMamba(InternVideo2-6B)	See all
CrossTask	VideoCLIP	See all
MultiTHUMOS	TriDet (VideoMAEv2)	See all
FineAction	ActionMamba(InternVideo2-6B)	See all
EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	See all
MUSES	TemporalMaxer	See all
MEXaction2	S-CNN	See all
ActivityNet-1.2	DeepMetricLearner	See all
THUMOS'14	AdaTAD (VideoMAEv2-giant)	See all
Ego4D MQ val	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
Ego4D MQ test	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
THUMOS14	BasicTAD (R50-SlowOnly)	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Temporal Action Localization models and implementations

open-mmlab/mmaction2

9 papers

3,906

yjxiong/caffe

4 papers

550

towhee-io/towhee

3 papers

3,000

bryanyzhu/two-stream-pytorch

3 papers

554

See all 12 libraries.

Datasets

Subtasks

Temporal Action Proposal Generation

Activity Recognition In Videos

Action Recognition In Still Images

Open-vocab Temporal Action Detection

Latest papers

Most implemented Social Latest No code

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

intellabs/multimodal_cognitive_ai • • 7 Oct 2023

We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP).

07 Oct 2023

Paper
Code

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

desehuileng0o0/ikem • • 21 Sep 2023

These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i. e., joints, bones, and motions are used, hence no additional modalities are explored.

21 Sep 2023

Paper
Code

Temporal Action Localization with Enhanced Instant Discriminability

dingfengshi/tridet • • 11 Sep 2023

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

149

11 Sep 2023

Paper
Code

CDFSL-V: Cross-Domain Few-Shot Learning for Videos

sarinda251/cdfsl-v • • ICCV 2023

To address this issue, in this work, we propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning to balance the information from the source and target domains.

07 Sep 2023

Paper
Code

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition

gftww/B2C • • IEEE Transactions on Image Processing 2023

Human Action Recognition plays a driving engine of many human-computer interaction applications.

30 Aug 2023

Paper
Code

POCO: 3D Pose and Shape Estimation with Confidence

saidwivedi/POCO • • 24 Aug 2023

To address this, we develop POCO, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass.

24 Aug 2023

Paper
Code

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

pipixin321/hr-pro • • 24 Aug 2023

For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class.

24 Aug 2023

Paper
Code

DD-GCN: Directed Diffusion Graph Convolutional Network for Skeleton-based Human Action Recognition

shiyin-lc/dd-gcn • • 24 Aug 2023

Graph Convolutional Networks (GCNs) have been widely used in skeleton-based human action recognition.

24 Aug 2023

Paper
Code

Video BagNet: short temporal receptive fields increase robustness in long-term action recognition

ombretta/videobagnet • • 22 Aug 2023

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF).

22 Aug 2023

Paper
Code

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic • • ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

3,008

21 Aug 2023

Paper
Code

Temporal Action Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result