no code implementations • LREC 2022 • Valentin Barriere, Slim Essid, Chloé Clavel
In this paper, we present the process we used in order to collect new annotations of opinions over the multimodal corpus SEMAINE composed of dyadic interactions.
no code implementations • 2 Dec 2024 • Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid
Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications.
no code implementations • 27 Nov 2024 • David Perera, François Derrida, Théo Mariotte, Gaël Richard, Slim Essid
Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals.
no code implementations • 6 Nov 2024 • Antonin Gagnere, Geoffroy Peeters, Slim Essid
In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking.
1 code implementation • 8 Oct 2024 • Hugo Malard, Michel Olvera, Stéphane Lathuiliere, Slim Essid
In this work, we introduce a novel methodology for bridging the audiovisual modality gap by matching the distributions of tokens produced by an audio backbone and those of an image captioner.
no code implementations • 19 Sep 2024 • Michel Olvera, Paraskevas Stamatiadis, Slim Essid
First, we find that the formatting of the prompts significantly affects performance so that simply prompting the models with properly formatted class labels performs competitively with optimized prompt templates and even prompt ensembling.
Ranked #1 on Zero-shot Audio Classification on ESC-50
1 code implementation • 22 Jul 2024 • David Perera, Victor Letzelter, Théo Mariotte, Adrien Cortés, Mickael Chen, Slim Essid, Gaël Richard
We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL.
no code implementations • 30 Jun 2024 • Salah Zaiem, Titouan Parcollet, Slim Essid
Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning.
1 code implementation • 7 Jun 2024 • Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez
Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses.
1 code implementation • 30 Jan 2024 • Elio Gruttadauria, Mathieu Fontaine, Slim Essid
The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).
no code implementations • 21 Dec 2023 • Aurian Quelennec, Michel Olvera, Geoffroy Peeters, Slim Essid
Choosing the best one for a set of tasks is the subject of many recent publications.
1 code implementation • CVPR 2024 • Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière
Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference.
1 code implementation • NeurIPS 2023 • Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard
Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses.
no code implementations • 28 Aug 2023 • Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data.
no code implementations • 31 Jul 2023 • Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina
Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array.
1 code implementation • 1 Jun 2023 • Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data.
no code implementations • 1 Jun 2023 • Salah Zaiem, Titouan Parcollet, Slim Essid
Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets.
1 code implementation • 31 Mar 2023 • Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière
Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e. g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts.
Data Augmentation One-shot Unsupervised Domain Adaptation +2
1 code implementation • 12 Mar 2023 • Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 8 Apr 2022 • Salah Zaiem, Titouan Parcollet, Slim Essid
Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training.
1 code implementation • 1 Jul 2021 • Salah Zaiem, Titouan Parcollet, Slim Essid, Abdel Heba
Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 15 Jun 2021 • Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina
Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene.
1 code implementation • 15 Apr 2021 • Salah Zaiem, Titouan Parcollet, Slim Essid
Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 3 Nov 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid
Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments.
1 code implementation • 2 Nov 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid
We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array.
no code implementations • 20 Apr 2020 • Atef Ben Youssef, Giovanna Varni, Slim Essid, Chloé Clavel
In this paper, we consider the detection of a decrease of engagement by users spontaneously interacting with a socially assistive robot in a public space.
Human-Computer Interaction Robotics
no code implementations • 13 Feb 2020 • Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid
Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world.
no code implementations • IJCNLP 2019 • Alexandre Garcia, Pierre Colombo, Slim Essid, Florence d'Alché-Buc, Chloé Clavel
The task of predicting fine grained user opinion based on spontaneous spoken language is a key problem arising in the development of Computational Agents as well as in the development of social network based opinion miners.
1 code implementation • 26 Feb 2019 • Alexandre Garcia, Slim Essid, Florence d'Alché-Buc, Chloé Clavel
We introduce specific categories in order to make the annotation of opinions easier for movie reviews.
no code implementations • 9 Nov 2018 • Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Duong, Patrick Pérez, Gaël Richard
We tackle the problem of audiovisual scene analysis for weakly-labeled data.
no code implementations • 20 Jun 2018 • Valentin Barriere, Chloé Clavel, Slim Essid
This model allows us to capture the dynamics of the reviewer's opinion in the transcripts of long unsegmented audio reviews that are analyzed by our system.
no code implementations • 19 Apr 2018 • Sanjeel Parekh, Slim Essid, Alexey Ozerov, Ngoc Q. K. Duong, Patrick Pérez, Gaël Richard
Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events.
no code implementations • ICML 2018 • Alexandre Garcia, Slim Essid, Chloé Clavel, Florence d'Alché-Buc
Motivated by Supervised Opinion Analysis, we propose a novel framework devoted to Structured Output Learning with Abstention (SOLA).