Temporal Action Localization

422 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Libraries

Use these libraries to find Temporal Action Localization models and implementations
9 papers
3,906
4 papers
550
3 papers
3,000
See all 12 libraries.

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

intellabs/multimodal_cognitive_ai 7 Oct 2023

We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP).

33
07 Oct 2023

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

desehuileng0o0/ikem 21 Sep 2023

These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i. e., joints, bones, and motions are used, hence no additional modalities are explored.

3
21 Sep 2023

Temporal Action Localization with Enhanced Instant Discriminability

dingfengshi/tridet 11 Sep 2023

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

149
11 Sep 2023

CDFSL-V: Cross-Domain Few-Shot Learning for Videos

sarinda251/cdfsl-v ICCV 2023

To address this issue, in this work, we propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning to balance the information from the source and target domains.

11
07 Sep 2023
5
30 Aug 2023

POCO: 3D Pose and Shape Estimation with Confidence

saidwivedi/POCO 24 Aug 2023

To address this, we develop POCO, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass.

45
24 Aug 2023

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

pipixin321/hr-pro 24 Aug 2023

For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class.

19
24 Aug 2023

DD-GCN: Directed Diffusion Graph Convolutional Network for Skeleton-based Human Action Recognition

shiyin-lc/dd-gcn 24 Aug 2023

Graph Convolutional Networks (GCNs) have been widely used in skeleton-based human action recognition.

4
24 Aug 2023

Video BagNet: short temporal receptive fields increase robustness in long-term action recognition

ombretta/videobagnet 22 Aug 2023

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF).

1
22 Aug 2023

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

3,008
21 Aug 2023