Action Detection
233 papers with code • 11 benchmarks • 33 datasets
Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.
Libraries
Use these libraries to find Action Detection models and implementationsDatasets
Subtasks
Latest papers
TIM: A Time Interval Machine for Audio-Visual Action Recognition
We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events.
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.
Online speaker diarization of meetings guided by speech separation
The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Instead of that, we train an Encoder-Decoder to generate a set of dynamic event memories at the glancing stage.
Generative Model-based Feature Knowledge Distillation for Action Recognition
Addressing this gap, our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model.
Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression
This research contributes to the development of more efficient and automated image segmentation methods, advancing the understanding of neural function in neuroscience research.
Semi-supervised Active Learning for Video Action Detection
First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21.
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1, 536 frames, leading to significant detection performance.
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality.
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors
ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.