Video Recognition

MoViNets: Mobile Video Networks for Efficient Video Recognition

tensorflow/models CVPR 2021

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Multiscale Vision Transformers

facebookresearch/SlowFast 22 Apr 2021

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

X3D: Expanding Architectures for Efficient Video Recognition

facebookresearch/SlowFast CVPR 2020

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

Audiovisual SlowFast Networks for Video Recognition

facebookresearch/SlowFast 23 Jan 2020

We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

kenshohara/3D-ResNets-PyTorch 10 Apr 2020

Therefore, in the present paper, we conduct exploration study in order to improve spatiotemporal 3D CNNs as follows: (i) Recently proposed large-scale video datasets help improve spatiotemporal 3D CNNs in terms of video classification accuracy.

Omni-sourced Webly-supervised Learning for Video Recognition

open-mmlab/mmaction ECCV 2020

Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.

Sequence Level Semantics Aggregation for Video Object Detection

open-mmlab/mmtracking ICCV 2019

In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection.

