Video Recognition
147 papers with code • 0 benchmarks • 10 datasets
Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.
Benchmarks
These leaderboards are used to track progress in Video Recognition
Libraries
Use these libraries to find Video Recognition models and implementationsDatasets
Most implemented papers
Revisiting 3D ResNets for Video Recognition
A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
In this study, we focus on transferring knowledge for video classification tasks.
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.
Deep Feature Flow for Video Recognition
Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.
Audiovisual SlowFast Networks for Video Recognition
We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.
Omni-sourced Webly-supervised Learning for Video Recognition
Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition
This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales.
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.