Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Recognition

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Video Recognition models and implementations

open-mmlab/mmaction2

5 papers

3,892

open-mmlab/mmtracking

3 papers

3,375

facebookresearch/pytorchvideo

3 papers

3,182

towhee-io/towhee

3 papers

2,991

See all 9 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

facebookresearch/OctConv • • ICCV 2019

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

Paper
Code

SlowFast Networks for Video Recognition

facebookresearch/SlowFast • • ICCV 2019

We present SlowFast networks for video recognition.

Paper
Code

Video Swin Transformer

SwinTransformer/Video-Swin-Transformer • • CVPR 2022

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

Paper
Code

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module • • ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Paper
Code

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

kenshohara/3D-ResNets-PyTorch • • 10 Apr 2020

Therefore, in the present paper, we conduct exploration study in order to improve spatiotemporal 3D CNNs as follows: (i) Recently proposed large-scale video datasets help improve spatiotemporal 3D CNNs in terms of video classification accuracy.

Paper
Code

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

joe-siyuan-qiao/WeightStandardization • • 25 Mar 2019

Batch Normalization (BN) has become an out-of-box technique to improve deep network training.

Paper
Code

X3D: Expanding Architectures for Efficient Video Recognition

facebookresearch/SlowFast • • CVPR 2020

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

Paper
Code

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

garythung/torch-lrcn • • CVPR 2015

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.

Paper
Code

Multiscale Vision Transformers

facebookresearch/SlowFast • • ICCV 2021

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Paper
Code

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

facebookresearch/detectron2 • • CVPR 2022

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

Paper
Code

Video Recognition

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result