Search Results for author: Slim Essid

Found 33 papers, 16 papers with code

Opinions in Interactions : New Annotations of the SEMAINE Database

no code implementations LREC 2022 Valentin Barriere, Slim Essid, Chloé Clavel

In this paper, we present the process we used in order to collect new annotations of opinions over the multimodal corpus SEMAINE composed of dyadic interactions.

TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

no code implementations2 Dec 2024 Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid

Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications.

Segmentation

Multiple Choice Learning for Efficient Speech Separation with Many Speakers

no code implementations27 Nov 2024 David Perera, François Derrida, Théo Mariotte, Gaël Richard, Slim Essid

Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals.

Multiple-choice Speech Separation

A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning

no code implementations6 Nov 2024 Antonin Gagnere, Geoffroy Peeters, Slim Essid

In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking.

Beat Tracking Few-Shot Learning +1

An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment

1 code implementation8 Oct 2024 Hugo Malard, Michel Olvera, Stéphane Lathuiliere, Slim Essid

In this work, we introduce a novel methodology for bridging the audiovisual modality gap by matching the distributions of tokens produced by an audio backbone and those of an image captioner.

Contrastive Learning Image Captioning +2

A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification

no code implementations19 Sep 2024 Michel Olvera, Paraskevas Stamatiadis, Slim Essid

First, we find that the formatting of the prompts significantly affects performance so that simply prompting the models with properly formatted class labels performs competitively with optimized prompt templates and even prompt ensembling.

Audio Classification Contrastive Learning +2

Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations

no code implementations30 Jun 2024 Salah Zaiem, Titouan Parcollet, Slim Essid

Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning.

Continual Learning Domain Generalization +2

Winner-takes-all learners are geometry-aware conditional density estimators

1 code implementation7 Jun 2024 Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez

Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses.

Density Estimation Quantization +1

Online speaker diarization of meetings guided by speech separation

1 code implementation30 Jan 2024 Elio Gruttadauria, Mathieu Fontaine, Slim Essid

The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).

Action Detection Activity Detection +3

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

no code implementations28 Aug 2023 Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data.

Benchmarking Self-Supervised Learning

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

no code implementations31 Jul 2023 Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array.

Speech Enhancement

Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

1 code implementation1 Jun 2023 Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data.

Benchmarking Decoder +1

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

no code implementations1 Jun 2023 Salah Zaiem, Titouan Parcollet, Slim Essid

Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets.

Data Augmentation Domain Adaptation +3

One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

1 code implementation31 Mar 2023 Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e. g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts.

Data Augmentation One-shot Unsupervised Domain Adaptation +2

Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

1 code implementation8 Apr 2022 Salah Zaiem, Titouan Parcollet, Slim Essid

Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training.

Contrastive Learning Data Augmentation +2

Pretext Tasks selection for multitask self-supervised speech representation learning

1 code implementation1 Jul 2021 Salah Zaiem, Titouan Parcollet, Slim Essid, Abdel Heba

Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

1 code implementation15 Jun 2021 Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene.

Speech Enhancement

Conditional independence for pretext task selection in Self-supervised speech representation learning

1 code implementation15 Apr 2021 Salah Zaiem, Titouan Parcollet, Slim Essid

Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

1 code implementation3 Nov 2020 Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments.

Diversity Noise Estimation +3

Distributed speech separation in spatially unconstrained microphone arrays

1 code implementation2 Nov 2020 Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array.

Diversity Speech Separation

On-the-fly Detection of User Engagement Decrease in Spontaneous Human-Robot Interaction, International Journal of Social Robotics, 2019

no code implementations20 Apr 2020 Atef Ben Youssef, Giovanna Varni, Slim Essid, Chloé Clavel

In this paper, we consider the detection of a decrease of engagement by users spontaneously interacting with a socially assistive robot in a public space.

Human-Computer Interaction Robotics

DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

no code implementations13 Feb 2020 Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world.

Speech Enhancement

From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining

no code implementations IJCNLP 2019 Alexandre Garcia, Pierre Colombo, Slim Essid, Florence d'Alché-Buc, Chloé Clavel

The task of predicting fine grained user opinion based on spontaneous spoken language is a key problem arising in the development of Computational Agents as well as in the development of social network based opinion miners.

Opinion Mining

A multimodal movie review corpus for fine-grained opinion mining

1 code implementation26 Feb 2019 Alexandre Garcia, Slim Essid, Florence d'Alché-Buc, Chloé Clavel

We introduce specific categories in order to make the annotation of opinions easier for movie reviews.

Opinion Mining

Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields

no code implementations20 Jun 2018 Valentin Barriere, Chloé Clavel, Slim Essid

This model allows us to capture the dynamics of the reviewer's opinion in the transcripts of long unsegmented audio reviews that are analyzed by our system.

General Classification

Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events

no code implementations19 Apr 2018 Sanjeel Parekh, Slim Essid, Alexey Ozerov, Ngoc Q. K. Duong, Patrick Pérez, Gaël Richard

Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events.

Multiple Instance Learning Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.