Action Detection

217 papers with code • 11 benchmarks • 34 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.


Use these libraries to find Action Detection models and implementations

Most implemented papers

Continuous control with deep reinforcement learning

ray-project/ray 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

wzmsltw/BSN-boundary-sensitive-network.pytorch ECCV 2018

Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content.

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

PaddlePaddle/models ICCV 2019

To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map.

Rescaling Egocentric Vision

epic-kitchens/epic-kitchens-100-annotations 23 Jun 2020

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.

Temporal Action Detection with Structured Segment Networks

open-mmlab/mmaction ICCV 2017

Detecting actions in untrimmed videos is an important yet challenging task.

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

tensorflow/models CVPR 2018

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

CAMMA-public/cholectriplet2021 10 Apr 2022

In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

wei-tim/YOWO 15 Nov 2019

YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.

From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video

JunweiLiang/Multiverse 20 Nov 2020

With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving, socially-aware robot assistant and public safety monitoring.