Temporal Action Localization

422 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Action Localization

Dataset	Best Model	Compare
THUMOS’14	AdaTAD (VideoMAEv2-giant)	See all
ActivityNet-1.3	ActionMamba (InternVideo2-6B)	See all
HACS	ActionMamba(InternVideo2-6B)	See all
CrossTask	VideoCLIP	See all
MultiTHUMOS	TriDet (VideoMAEv2)	See all
FineAction	ActionMamba(InternVideo2-6B)	See all
EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	See all
MUSES	TemporalMaxer	See all
MEXaction2	S-CNN	See all
ActivityNet-1.2	DeepMetricLearner	See all
THUMOS'14	AdaTAD (VideoMAEv2-giant)	See all
Ego4D MQ val	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
Ego4D MQ test	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
THUMOS14	BasicTAD (R50-SlowOnly)	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Temporal Action Localization models and implementations

open-mmlab/mmaction2

9 papers

3,901

yjxiong/caffe

4 papers

550

towhee-io/towhee

3 papers

2,997

bryanyzhu/two-stream-pytorch

3 papers

554

See all 12 libraries.

Datasets

Subtasks

Temporal Action Proposal Generation

Activity Recognition In Videos

Action Recognition In Still Images

Latest papers with no code

Most implemented Social Latest No code

Language Model Guided Interpretable Video Action Reasoning

no code yet • 2 Apr 2024

Extensive experiments on two complex video action datasets, Charades & CAD-120, validates the improved performance and interpretability of our LaIAR framework.

Paper
Add Code

LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization

no code yet • 1 Apr 2024

Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video.

Paper
Add Code

PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

no code yet • 27 Mar 2024

This paper introduces a novel approach to temporal action localization (TAL) in few-shot learning.

Paper
Add Code

Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes

no code yet • 17 Mar 2024

To this end, we first devise innovative strategies to adaptively select high-quality positive and negative classes from the label space, by modeling both the confidence and rank of a class in relation to those of the target class.

Paper
Add Code

Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines

no code yet • 14 Mar 2024

We proposed a method of converting hand action recognition problems into hand skeletal trajectory classification problems, which solved the real-time performance problem of industrial algorithms.

Paper
Add Code

BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin

no code yet • 12 Mar 2024

Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images.

Paper
Add Code

Deep Learning Approaches for Human Action Recognition in Video Data

no code yet • 11 Mar 2024

The results of this study underscore the potential of composite models in achieving robust human action recognition and suggest avenues for future research in optimizing these models for real-world deployment.

Paper
Add Code

Density-Guided Label Smoothing for Temporal Localization of Driving Actions

no code yet • 11 Mar 2024

Temporal localization of driving actions plays a crucial role in advanced driver-assistance systems and naturalistic driving studies.

Paper
Add Code

Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition

no code yet • 11 Mar 2024

The model uses 2D-pose features as the positional embedding of the transformer architecture and spatio-temporal features as the main input to the encoder of the transformer.

Paper
Add Code

TikTokActions: A TikTok-Derived Video Dataset for Human Action Recognition

no code yet • 14 Feb 2024

We find that the performance of the model pre-trained using our Tik-Tok dataset is comparable to models trained on larger action recognition datasets (95. 3% on UCF101 and 53. 24% on HMDB51).

Paper
Add Code

Temporal Action Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result