Temporal Action Localization

421 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Libraries

Use these libraries to find Temporal Action Localization models and implementations
9 papers
3,876
4 papers
549
3 papers
2,972
See all 12 libraries.

Test-Time Zero-Shot Temporal Action Localization

benedettaliberatori/t3al 8 Apr 2024

To this aim, we introduce a novel method that performs Test-Time adaptation for Temporal Action Localization (T3AL).

11
08 Apr 2024

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

yingsen1/unimd 7 Apr 2024

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.

6
07 Apr 2024

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

ttgeng233/UniAV 4 Apr 2024

Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).

3
04 Apr 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

897
22 Mar 2024

A Lie Group Approach to Riemannian Batch Normalization

gitzh-chen/liebn 17 Mar 2024

Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups.

6
17 Mar 2024

Skeleton-Based Human Action Recognition with Noisy Labels

xuyizdby/noiseerasar 15 Mar 2024

In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark.

2
15 Mar 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

opengvlab/video-mamba-suite 14 Mar 2024

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

99
14 Mar 2024

Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

Trustworthy-AI-Group/TransferAttack 6 Feb 2024

Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems.

132
06 Feb 2024

Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition

buptsjzhang/std-cl 23 Dec 2023

Furthermore, to explicitly exploit the latent data distributions, we employ the attentive features to contrastive learning, which models the cross-sequence semantic relations by pulling together the features from the positive pairs and pushing away the negative pairs.

5
23 Dec 2023

Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach

qinying-liu/case ICCV 2023

It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background.

98
21 Dec 2023