Video Recognition Models

MoViNet

Introduced by Kondratyuk et al. in MoViNets: Mobile Video Networks for Efficient Video Recognition

Mobile Video Network, or MoViNet, is a type of computation and memory efficient video network that can operate on streaming video for online inference. Three techniques are used to improve efficiency while reducing the peak memory usage of 3D CNNs. First, a video network search space is designed and neural architecture search employed to generate efficient and diverse 3D CNN architectures. Second, a Stream Buffer technique is introduced that decouples memory from video clip duration, allowing 3D CNNs to embed arbitrary-length streaming video sequences for both training and inference with a small constant memory footprint. Third, a simple ensembling technique is used to improve accuracy further without sacrificing efficiency.

Source: MoViNets: Mobile Video Networks for Efficient Video Recognition

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Denoising 1 14.29%
Video Denoising 1 14.29%
Action Classification 1 14.29%
Action Recognition 1 14.29%
Computational Efficiency 1 14.29%
Temporal Action Localization 1 14.29%
Video Recognition 1 14.29%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories