Search Results for author: Neil Zeghidour

Found 35 papers, 15 papers with code

MAD Speech: Measures of Acoustic Diversity of Speech

no code implementations16 Apr 2024 Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov

Using these datasets, we demonstrate that our proposed metrics achieve a stronger agreement with the ground-truth diversity than baselines.


MusicRL: Aligning Music Generation to Human Preferences

no code implementations6 Feb 2024 Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli

MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards.

Music Generation

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

no code implementations13 Mar 2023 Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J. N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner

We developed dysarthric speech intelligibility classifiers on 551, 176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale.

DNArch: Learning Convolutional Neural Architectures by Backpropagation

no code implementations10 Feb 2023 David W. Romero, Neil Zeghidour

We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation.

SingSong: Generating musical accompaniments from singing

no code implementations30 Jan 2023 Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice.

Audio Generation Retrieval

Multi-instrument Music Synthesis with Spectrogram Diffusion

1 code implementation11 Jun 2022 Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes.

Decoder Generative Adversarial Network +1

Disentangling speech from surroundings with neural embeddings

no code implementations29 Mar 2022 Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi

We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec.


Learning neural audio features without supervision

no code implementations29 Mar 2022 Sarthak Yadav, Neil Zeghidour

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work.

Audio Classification Self-Supervised Learning

Learning strides in convolutional neural networks

1 code implementation ICLR 2022 Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

In particular, we show that introducing our layer into a ResNet-18 architecture allows keeping consistent high performance on CIFAR10, CIFAR100 and ImageNet even when training starts from poor random stride configurations.

Image Classification

SoundStream: An End-to-End Neural Audio Codec

5 code implementations7 Jul 2021 Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs.

Decoder Speech Enhancement

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

no code implementations28 May 2021 Neil Zeghidour, Olivier Teboul, David Grangier

Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each speaker conditioned on the extracted representations.

speaker-diarization Speaker Diarization

LEAF: A Learnable Frontend for Audio Classification

4 code implementations21 Jan 2021 Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification.

Audio Classification General Classification

Shuffle to Learn: Self-supervised learning from permutations via differentiable ranking

no code implementations1 Jan 2021 Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

In particular, we also improve music understanding by reordering spectrogram patches in the frequency space, as well as video classification by reordering frames along the time axis.

General Classification Self-Supervised Learning +1

A Universal Learnable Audio Frontend

no code implementations ICLR 2021 Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have lived through the history of audio understanding up to today.

Audio Classification

Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

no code implementations21 Oct 2020 Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

We propose CHARM, a method for training a single neural network across inconsistent input channels.


Contrastive Learning of General-Purpose Audio Representations

2 code implementations21 Oct 2020 Aaqib Saeed, David Grangier, Neil Zeghidour

We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio.

CoLA Contrastive Learning +2

Wavesplit: End-to-End Speech Separation by Speaker Clustering

no code implementations20 Feb 2020 Neil Zeghidour, David Grangier

Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation.

Clustering Data Augmentation +1

Fully Convolutional Speech Recognition

no code implementations17 Dec 2018 Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert

In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.

Language Modelling speech-recognition +1

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

no code implementations9 Dec 2018 Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.

Multi-Task Learning Speaker Recognition +2

Learning to detect dysarthria from raw speech

3 code implementations27 Nov 2018 Juliette Millet, Neil Zeghidour

We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture.

General Classification Sentence +2

SING: Symbol-to-Instrument Neural Generator

1 code implementation NeurIPS 2018 Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

Audio Synthesis Decoder +1

End-to-End Speech Recognition From the Raw Waveform

1 code implementation19 Jun 2018 Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.

speech-recognition Speech Recognition

Sampling strategies in Siamese Networks for unsupervised speech representation learning

2 code implementations30 Apr 2018 Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux

We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.

Representation Learning Speech Representation Learning

Fader Networks:Manipulating Images by Sliding Attributes

no code implementations NeurIPS 2017 Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.

Attribute Decoder

Learning Filterbanks from Raw Speech for Phone Recognition

2 code implementations3 Nov 2017 Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux

We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition.

Fader Networks: Manipulating Images by Sliding Attributes

3 code implementations1 Jun 2017 Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.

Attribute Decoder

Learning weakly supervised multimodal phoneme embeddings

no code implementations23 Apr 2017 Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux

Recent works have explored deep architectures for learning multimodal speech representation (e. g. audio and images, articulation and audio) in a supervised way.

Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.