Action Detection

235 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Libraries

Use these libraries to find Action Detection models and implementations
6 papers
3,911
2 papers
3,001
See all 6 libraries.

Most implemented papers

Actions as Moving Points

MCG-NJU/MOC-Detector ECCV 2020

The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization.

Harvesting Ambient RF for Presence Detection Through Deep Learning

bigtreeyanger/presence_detection_cnn 13 Feb 2020

With presence detection, how to collect training data with human presence can have a significant impact on the performance.

PaStaNet: Toward Human Activity Knowledge Engine

DirtyHarryLYL/HAKE CVPR 2020

In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics.

Asynchronous Interaction Aggregation for Action Detection

MVIG-SJTU/AlphAction ECCV 2020

We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

Generic Event Boundary Detection: A Benchmark for Event Segmentation

StanLei52/GEBD ICCV 2021

This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.

Relaxed Transformer Decoders for Direct Action Proposal Generation

MCG-NJU/RTD-Action ICCV 2021

Extensive experiments on THUMOS14 and ActivityNet-1. 3 benchmarks demonstrate the effectiveness of RTD-Net, on both tasks of temporal action proposal generation and temporal action detection.

ROAD: The ROad event Awareness Dataset for Autonomous Driving

gurkirt/road-dataset 23 Feb 2021

We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.

End-to-end speaker segmentation for overlap-aware resegmentation

pyannote/segmentation 8 Apr 2021

Experiments on multiple speaker diarization datasets conclude that our model can be used with great success on both voice activity detection and overlapped speech detection.

Long Short-Term Transformer for Online Action Detection

amazon-research/long-short-term-transformer NeurIPS 2021

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.