Video Recognition

93 papers with code • 0 benchmarks • 7 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find Video Recognition models and implementations

Most implemented papers

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

facebookresearch/OctConv ICCV 2019

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

Video Swin Transformer

SwinTransformer/Video-Swin-Transformer CVPR 2022

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

kenshohara/3D-ResNets-PyTorch 10 Apr 2020

Therefore, in the present paper, we conduct exploration study in order to improve spatiotemporal 3D CNNs as follows: (i) Recently proposed large-scale video datasets help improve spatiotemporal 3D CNNs in terms of video classification accuracy.

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

joe-siyuan-qiao/WeightStandardization 25 Mar 2019

Batch Normalization (BN) has become an out-of-box technique to improve deep network training.

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

garythung/torch-lrcn CVPR 2015

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.

X3D: Expanding Architectures for Efficient Video Recognition

facebookresearch/SlowFast CVPR 2020

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

Multiscale Vision Transformers

facebookresearch/SlowFast ICCV 2021

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Deep Feature Flow for Video Recognition

msracver/Deep-Feature-Flow CVPR 2017

Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.