Browse > Computer Vision > Action Localization

Action Localization

11 papers with code · Computer Vision

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 tensorflow/models

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations.

ACTION LOCALIZATION ACTION RECOGNITION VIDEO UNDERSTANDING

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

CVPR 2016 zhengshou/scnn

This is important because videos in real applications are usually unconstrained and contain multiple action instances plus video content of background scenes or other activities. To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.

ACTION CLASSIFICATION ACTION LOCALIZATION TEMPORAL LOCALIZATION

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

ICCV 2017 jiyanggao/TURN-TAP

Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e.g. human actions) segments from untrimmed videos is an important step for large-scale video analysis. We propose a novel Temporal Unit Regression Network (TURN) model.

ACTION LOCALIZATION

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

26 Dec 2017hangzhaomit/HACS-dataset

Overall, HACS Clips consists of 1.55M annotated clips sampled from 504K untrimmed videos, and HACS Segments contains 139K action segments densely annotated in 50K untrimmed videos spanning 200 action categories. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense and fine-grained temporal annotations.

ACTION CLASSIFICATION ACTION LOCALIZATION ACTION RECOGNITION TEMPORAL LOCALIZATION TRANSFER LEARNING

Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond

6 Nov 2018kkanshul/Hide-and-Seek

Our approach only needs to modify the input image and can work with any network to improve its performance. The main advantage of Hide-and-Seek over existing data augmentation techniques is its ability to improve object localization accuracy in the weakly-supervised setting, and we therefore use this task to motivate the approach.

ACTION LOCALIZATION EMOTION RECOGNITION IMAGE CLASSIFICATION OBJECT LOCALIZATION PERSON RE-IDENTIFICATION SEMANTIC SEGMENTATION

AutoLoc: Weakly-supervised Temporal Action Localization

22 Jul 2018zhengshou/AutoLoc

In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance. It is also very encouraging to see that our weakly-supervised method achieves comparable results with some fully-supervised methods.

ACTION LOCALIZATION

Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization

ICCV 2017 zhengshou/AutoLoc

We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Our key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden.

ACTION LOCALIZATION WEAKLY-SUPERVISED OBJECT LOCALIZATION

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

4 Apr 2015zhengshou/AutoLoc

We find that web images queried by action names serve as well-localized highlights for many actions, but are noisily labeled. To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.

ACTION LOCALIZATION ACTION RECOGNITION TEMPORAL LOCALIZATION

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

CVPR 2018 demianzhang/weakly-action-localization

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations.

ACTION CLASSIFICATION ACTION LOCALIZATION TEMPORAL LOCALIZATION

Guess Where? Actor-Supervision for Spatiotemporal Action Localization

5 Apr 2018escorciav/roi_pooling

Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable.

ACTION LOCALIZATION