Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,866
3 papers
2,968
See all 9 libraries.

Latest papers with no code

LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

no code yet • 18 Mar 2024

Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency.

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

no code yet • 29 Feb 2024

Finally, we blend external multimodal knowledge in Adapt stage, by inserting multimodal knowledge adaptation modules into networks.

Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition

no code yet • 11 Jan 2024

We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models, respectively.

Motion Guided Token Compression for Efficient Masked Video Modeling

no code yet • 10 Jan 2024

By implementing MGTC with the masking ratio of 25\%, we further augment accuracy by 0. 1 and simultaneously reduce computational costs by over 31\% on Kinetics-400.

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

no code yet • 4 Dec 2023

To this end, we design effective cross-snippet propagation modules to gradually exchange short-term video information among different snippets from two levels.

Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer

no code yet • 11 Sep 2023

Phacoemulsification cataract surgery (PCS) is a routine procedure conducted using a surgical microscope, heavily reliant on the skill of the ophthalmologist.

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

no code yet • ICCV 2023

VTD is a promising new direction for exploring the unification of perception tasks in autonomous driving.

Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

no code yet • 21 Aug 2023

Although there are extensive studies on backdoor attacks against image data, the susceptibility of video-based systems under backdoor attacks remains largely unexplored.

Audio-Visual Glance Network for Efficient Video Recognition

no code yet • ICCV 2023

To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.