no code implementations • 2 Dec 2024 • Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid
Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications.
1 code implementation • 8 Oct 2024 • Hugo Malard, Michel Olvera, Stéphane Lathuiliere, Slim Essid
In this work, we introduce a novel methodology for bridging the audiovisual modality gap by matching the distributions of tokens produced by an audio backbone and those of an image captioner.
no code implementations • 19 Sep 2024 • Michel Olvera, Paraskevas Stamatiadis, Slim Essid
First, we find that the formatting of the prompts significantly affects performance so that simply prompting the models with properly formatted class labels performs competitively with optimized prompt templates and even prompt ensembling.
Ranked #1 on Zero-shot Audio Classification on ESC-50
no code implementations • 21 Dec 2023 • Aurian Quelennec, Michel Olvera, Geoffroy Peeters, Slim Essid
Choosing the best one for a set of tasks is the subject of many recent publications.
no code implementations • 11 May 2020 • Michel Olvera, Emmanuel Vincent, Romain Serizel, Gilles Gasso
Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background.