Action Localization

135 papers with code • 0 benchmarks • 3 datasets

Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Localization

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Action Localization models and implementations

Pilhyeon/Learning-Action-Completene…

3 papers

open-mmlab/mmaction2

2 papers

3,896

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization

harlanhong/MM2021-CO2-Net • • 27 Jul 2021

In this work, we argue that the features extracted from the pretrained extractor, e. g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy.

Paper
Code

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

pytorch/fairseq • • EMNLP 2021

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

Paper
Code

Structured Attention Composition for Temporal Action Localization

vividle/online-action-detection • • 20 May 2022

To tackle this issue, we make an early effort to study temporal action localization from the perspective of multi-modality feature learning, based on the observation that different actions exhibit specific preferences to appearance or motion modality.

Paper
Code

Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge

happyharrycn/actionformer_release • • 16 Nov 2022

This report describes our submission to the Ego4D Moment Queries Challenge 2022.

Paper
Code

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

zzxslp/mm-navigator • 13 Nov 2023

We first benchmark MM-Navigator on our collected iOS screen dataset.

Paper
Code

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

adapt-python/adapt • • CVPR 2014

We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets.

Paper
Code

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

zhengshou/AutoLoc • 4 Apr 2015

To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.

Paper
Code

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

zhengshou/scnn • CVPR 2016

To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.

Paper
Code