Spatio-Temporal Action Localization

13 papers with code • 1 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

396
29 Mar 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

opengvlab/unmasked_teacher ICCV 2023

Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain.

242
28 Mar 2023

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

opengvlab/internvideo 6 Dec 2022

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

921
06 Dec 2022

E^2TAD: An Energy-Efficient Tracking-based Action Detector

VITA-Group/21LPCV-UAV-Solution 9 Apr 2022

Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays.

14
09 Apr 2022

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

tensorflow/models CVPR 2022

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.

76,598
09 Dec 2021

KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal Action Localization

Kalana304/KORSAL 5 Nov 2021

Despite the simplicity of our approach, our lightweight end-to-end architecture achieves state-of-the-art frame-mAP of 74. 7% on the challenging UCF101-24 dataset, demonstrating a performance gain of 6. 4% over the previous best online methods.

5
05 Nov 2021

ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos

coldmanck/VidHOI 25 May 2021

Detecting human-object interactions (HOI) is an important step toward a comprehensive visual understanding of machines.

46
25 May 2021

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

Siyu-C/ACAR-Net 16 Jun 2020

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.

198
16 Jun 2020

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

towhee-io/towhee CVPR 2021

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

2,991
14 Jun 2020

Video action detection by learning graph-based spatio-temporal interactions

aimagelab/STAGE_action_detection 9 Dec 2019

Action Detection is a complex task that aims to detect and classify human actions in video clips.

50
09 Dec 2019