Video Recognition
147 papers with code • 0 benchmarks • 10 datasets
Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.
Benchmarks
These leaderboards are used to track progress in Video Recognition
Libraries
Use these libraries to find Video Recognition models and implementationsDatasets
Most implemented papers
MoViNets: Mobile Video Networks for Efficient Video Recognition
We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.
Flow-Guided Feature Aggregation for Video Object Detection
The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.
A^2-Nets: Double Attention Networks
Learning to capture long-range relations is fundamental to image/video recognition.
Sequence Level Semantics Aggregation for Video Object Detection
In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection.
Improved Residual Networks for Image and Video Recognition
We successfully train a 404-layer deep CNN on the ImageNet dataset and a 3002-layer network on CIFAR-10 and CIFAR-100, while the baseline is not able to converge at such extreme depths.
TAM: Temporal Adaptive Module for Video Recognition
Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities.
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework
With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.
Learning Equivariant Representations
In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling.
Towards Long-Form Video Understanding
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.