Video Classification

90 papers with code • 8 benchmarks • 10 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Greatest papers with code

Group Normalization

facebookresearch/detectron ECCV 2018

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Object Detection Platform +1

Non-local Neural Networks

facebookresearch/detectron CVPR 2018

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #7 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +3

Revisiting ResNets: Improved Training and Scaling Strategies

rwightman/pytorch-image-models 13 Mar 2021

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Ranked #27 on Image Classification on CIFAR-100 (using extra training data)

Action Classification Image Classification +1

X3D: Expanding Architectures for Efficient Video Recognition

facebookresearch/SlowFast CVPR 2020

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

Action Classification Feature Selection +4

A Multigrid Method for Efficiently Training Video Models

facebookresearch/SlowFast CVPR 2020

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Action Detection Action Recognition +1

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

kenshohara/3D-ResNets-PyTorch 10 Apr 2020

Therefore, in the present paper, we conduct exploration study in order to improve spatiotemporal 3D CNNs as follows: (i) Recently proposed large-scale video datasets help improve spatiotemporal 3D CNNs in terms of video classification accuracy.

General Classification Video Classification +1

YouTube-8M: A Large-Scale Video Classification Benchmark

google/youtube-8m 27 Sep 2016

Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.

 Ranked #1 on Action Recognition In Videos on Sports-1M (Video hit@1 metric)

3D Face Reconstruction Action Recognition +2

Temporal Segment Networks for Action Recognition in Videos

open-mmlab/mmaction 8 May 2017

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Ranked #12 on Action Classification on Moments in Time (Top 5 Accuracy metric)

Action Classification Action Recognition +2

Is Space-Time Attention All You Need for Video Understanding?

open-mmlab/mmaction2 9 Feb 2021

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

Action Classification Action Recognition +3