Action Classification

This task has no description! Would you like to contribute one?

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

tensorflow/models 22 Apr 2021

We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks.

 Ranked #1 on Action Classification on Moments in Time (using extra training data)

Action Classification Action Recognition +6

MoViNets: Mobile Video Networks for Efficient Video Recognition

tensorflow/models CVPR 2021

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Action Classification Action Recognition +2

Non-local Neural Networks

facebookresearch/detectron CVPR 2018

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #7 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +3

AssembleNet++: Assembling Modality Representations via Attention Connections

google-research/google-research 18 Aug 2020

We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.

Action Classification Activity Recognition

Revisiting ResNets: Improved Training and Scaling Strategies

rwightman/pytorch-image-models 13 Mar 2021

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Ranked #25 on Image Classification on CIFAR-100 (using extra training data)

Action Classification Image Classification +1

Large-scale weakly-supervised pre-training for video action recognition

microsoft/computervision-recipes CVPR 2019

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

 Ranked #1 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)

Action Classification Action Recognition +3

A Closer Look at Spatiotemporal Convolutions for Action Recognition

microsoft/computervision-recipes CVPR 2018

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

Action Classification Action Recognition

Deep Concept-wise Temporal Convolutional Networks for Action Localization

PaddlePaddle/models 26 Aug 2019

In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.

Action Classification Action Localization

Multiscale Vision Transformers

facebookresearch/SlowFast 22 Apr 2021

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Action Classification Action Recognition +2