Video Understanding

300 papers with code • 0 benchmarks • 42 datasets

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Understanding

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Video Understanding models and implementations

open-mmlab/mmaction2

7 papers

3,904

towhee-io/towhee

4 papers

3,000

google-research/scenic

2 papers

3,008

MIT-HAN-LAB/temporal-shift-module

2 papers

2,018

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

A Multigrid Method for Efficiently Training Video Models

facebookresearch/SlowFast • • CVPR 2020

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Paper
Code

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection

tensorflow/models • • CVPR 2020

In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.

Paper
Code

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Siyu-C/ACAR-Net • • CVPR 2021

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

Paper
Code

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

SilvioGiancola/SoccerNetv2-DevKit • • 26 Nov 2020

In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.

Paper
Code

Token Shift Transformer for Video Classification

VideoNetworks/TokShift-Transformer • • 5 Aug 2021

It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.

Paper
Code

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection

mcg-nju/ddm • • CVPR 2022

Generic event boundary detection is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries.

Paper
Code

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

OpenGVLab/UniFormerV2 • • 17 Nov 2022

UniFormer has successfully alleviated this issue, by unifying convolution and self-attention as a relation aggregator in the transformer format.

Paper
Code

Panoptic Video Scene Graph Generation

jingkang50/openpvsg • • CVPR 2023

PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.

Paper
Code

VideoMamba: State Space Model for Efficient Video Understanding

opengvlab/videomamba • • 11 Mar 2024

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain.

Paper
Code

Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge

boliu61/youtube-8m • • 21 Aug 2018

This paper presents our 7th place solution to the second YouTube-8M video understanding competition which challenges participates to build a constrained-size model to classify millions of YouTube videos into thousands of classes.

Paper
Code

Video Understanding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result