Action Detection

233 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Libraries

Use these libraries to find Action Detection models and implementations
6 papers
3,862
2 papers
2,959
See all 6 libraries.

Most implemented papers

Continuous control with deep reinforcement learning

ray-project/ray 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

wzmsltw/BSN-boundary-sensitive-network.pytorch ECCV 2018

Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content.

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

PaddlePaddle/models ICCV 2019

To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map.

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

tensorflow/models CVPR 2018

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Rescaling Egocentric Vision

epic-kitchens/epic-kitchens-100-annotations 23 Jun 2020

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.

Temporal Action Detection with Structured Segment Networks

open-mmlab/mmaction ICCV 2017

Detecting actions in untrimmed videos is an important yet challenging task.

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

CAMMA-public/cholectriplet2021 10 Apr 2022

In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.

HAKE: Human Activity Knowledge Engine

DirtyHarryLYL/HAKE 13 Apr 2019

To address these and promote the activity understanding, we build a large-scale Human Activity Knowledge Engine (HAKE) based on the human body part states.

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

wei-tim/YOWO 15 Nov 2019

YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.