About

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Libraries

Subtasks

Datasets

Latest papers with code

Unsupervised Visual Representation Learning by Tracking Patches in Video

6 May 2021microsoft/CtP

The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame.

ACTION CLASSIFICATION ACTION CLASSIFICATION ACTION RECOGNITION REPRESENTATION LEARNING

12
06 May 2021

Multiscale Vision Transformers

22 Apr 2021facebookresearch/SlowFast

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

 Ranked #1 on Action Classification on Kinetics-600 (Vid acc@1 metric)

ACTION CLASSIFICATION ACTION RECOGNITION IMAGE CLASSIFICATION VIDEO RECOGNITION

3,673
22 Apr 2021

Object Priors for Classifying and Localizing Unseen Actions

10 Apr 2021psmmettes/object-priors-unseen-actions

This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples.

ACTION CLASSIFICATION ACTION CLASSIFICATION ACTION LOCALIZATION VIDEO RETRIEVAL WORD EMBEDDINGS

0
10 Apr 2021

ViViT: A Video Vision Transformer

29 Mar 2021rishikksh20/ViViT-pytorch

We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification.

 Ranked #1 on Action Classification on Kinetics-600 (using extra training data)

ACTION CLASSIFICATION ACTION RECOGNITION CLASSIFICATION IMAGE CLASSIFICATION

50
29 Mar 2021

An Image is Worth 16x16 Words, What is a Video Worth?

25 Mar 2021Alibaba-MIIL/STAM

Methods that reach State of the Art (SotA) accuracy, usually make use of 3D convolution layers as a way to abstract the temporal information from video frames.

 Ranked #1 on Action Classification on Kinetics-400 (Flops x views metric)

ACTION CLASSIFICATION ACTION RECOGNITION

141
25 Mar 2021

MoViNets: Mobile Video Networks for Efficient Video Recognition

21 Mar 2021tensorflow/models

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

ACTION CLASSIFICATION ACTION RECOGNITION NEURAL ARCHITECTURE SEARCH VIDEO RECOGNITION

69,820
21 Mar 2021

Revisiting ResNets: Improved Training and Scaling Strategies

13 Mar 2021rwightman/pytorch-image-models

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

ACTION CLASSIFICATION IMAGE CLASSIFICATION VIDEO CLASSIFICATION

9,479
13 Mar 2021

Domain and View-point Agnostic Hand Action Recognition

3 Mar 2021AlbertoSabater/Domain-and-View-point-Agnostic-Hand-Action-Recognition

We demonstrate the performance of our proposed motion representation model both working for a single specific domain (intra-domain action classification) and working for different unseen domains (cross-domain action classification).

ACTION CLASSIFICATION ACTION RECOGNITION HUMAN ROBOT INTERACTION SKELETON BASED ACTION RECOGNITION

1
03 Mar 2021

Is Space-Time Attention All You Need for Video Understanding?

9 Feb 2021lucidrains/TimeSformer-pytorch

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

ACTION CLASSIFICATION ACTION RECOGNITION CLASSIFICATION VIDEO QUESTION ANSWERING VIDEO UNDERSTANDING

415
09 Feb 2021