Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Libraries

Use these libraries to find Video Classification models and implementations

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

boheumd/MA-LMM 8 Apr 2024

However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.

105
08 Apr 2024

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

annusha/xmic 28 Mar 2024

Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.

3
28 Mar 2024

Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer

2313595986/prostatetrus 14 Feb 2024

With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos.

0
14 Feb 2024

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

netflix/videoannotator 9 Feb 2024

High-quality and consistent annotations are fundamental to the successful development of robust machine learning models.

16
09 Feb 2024

FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

gautamshahi/fakeclaim 29 Jan 2024

We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification.

1
29 Jan 2024

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

opengvlab/internvl 21 Dec 2023

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

714
21 Dec 2023

Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach

qinying-liu/case ICCV 2023

It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background.

98
21 Dec 2023

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

tacju/maxtron 30 Nov 2023

To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively.

25
30 Nov 2023

Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

calvintanama/qd-driver-activity-reco 10 Nov 2023

The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification, by incorporating knowledge distillation and model quantization to balance model accuracy and computational efficiency.

8
10 Nov 2023

Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval

dun-research/drca 15 Sep 2023

To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations.

4
15 Sep 2023