GZSL Video Classification

3 papers with code • 6 benchmarks • 1 datasets

Audio-visual zero-shot learning aims to recognize unseen categories based on paired audio-visual sequences.

Datasets


Most implemented papers

Temporal and cross-modal attention for audio-visual zero-shot learning

explainableml/tcaf-gzsl 20 Jul 2022

We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

explainableml/avca-gzsl CVPR 2022

Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.

Boosting Audio-visual Zero-shot Learning with Large Language Models

chenhaoxing/KDA 21 Nov 2023

Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen categories.