Action Localization

136 papers with code • 0 benchmarks • 3 datasets

Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Localization

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Action Localization models and implementations

Pilhyeon/Learning-Action-Completene…

3 papers

open-mmlab/mmaction2

2 papers

3,929

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

tensorflow/models • • CVPR 2018

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Paper
Code

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

antoine77340/MIL-NCE_HowTo100M • • ICCV 2019

In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.

Paper
Code

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

wei-tim/YOWO • • 15 Nov 2019

YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.

Paper
Code

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

antoine77340/MIL-NCE_HowTo100M • • CVPR 2020

Annotating videos is cumbersome, expensive and not scalable.

Paper
Code

Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets

camma-public/tripnet • • 10 Jul 2020

Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room.

Paper
Code

Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization

zhengshou/AutoLoc • ICCV 2017

We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.

Paper
Code

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

demianzhang/weakly-action-localization • CVPR 2018

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.

Paper
Code

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Siyu-C/ACAR-Net • • CVPR 2021

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

Paper
Code

Temporal Action Localization with Enhanced Instant Discriminability

dingfengshi/tridetplus • • 11 Sep 2023

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

Paper
Code

Action Tubelet Detector for Spatio-Temporal Action Localization

vkalogeiton/caffe • ICCV 2017

We propose the ACtion Tubelet detector (ACT-detector) that takes as input a sequence of frames and outputs tubelets, i. e., sequences of bounding boxes with associated scores.

Paper
Code

Action Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result