Video Recognition
147 papers with code • 0 benchmarks • 10 datasets
Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.
Benchmarks
These leaderboards are used to track progress in Video Recognition
Libraries
Use these libraries to find Video Recognition models and implementationsDatasets
Latest papers
VG4D: Vision-Language Model Goes 4D Video Recognition
By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.
Don't Judge by the Look: Towards Motion Coherent Video Representation
Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice.
HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition
Action recognition in videos poses a challenge due to its high computational cost, especially for Joint Space-Time video transformers (Joint VT).
Video Recognition in Portrait Mode
While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
It is intuitive to combine them for high-performance RGB-Event based video recognition, however, existing works fail to achieve a good balance between the accuracy and model parameters, as shown in Fig.~\ref{firstimage}.
LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer
We separate the attack into three stages: style reference selection, reinforcement-learning-based logo style transfer, and perturbation optimization.
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Due to the resource-intensive nature of training vision-language models on expansive video data, a majority of studies have centered on adapting pre-trained image-language models to the video domain.
Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition
Infertility is a global health problem, and an increasing number of couples are seeking medical assistance to achieve reproduction, at least half of which are caused by men.
Object-centric Video Representation for Long-term Action Anticipation
To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.