Search Results for author: Alexandre Défossez

Found 20 papers, 15 papers with code

Moshi: a speech-text foundation model for real-time dialogue

1 code implementation17 Sep 2024 Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, Neil Zeghidour

Our resulting model is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice, and is available at https://github. com/kyutai-labs/moshi.

Action Detection Activity Detection +5

An Independence-promoting Loss for Music Generation with Language Models

no code implementations4 Jun 2024 Jean-Marie Lemercier, Simon Rouard, Jade Copet, Yossi Adi, Alexandre Défossez

This leads to an increase in the generated music quality when modelling the product of the marginal distributions, while generating audio much faster than the joint distribution model.

Language Modelling Music Generation

Proactive Detection of Voice Cloning with Localized Watermarking

1 code implementation30 Jan 2024 Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning.

Voice Cloning

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code implementations9 Jan 2024 Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Audio Generation

Code Llama: Open Foundation Models for Code

2 code implementations24 Aug 2023 Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

16k Code Generation +2

Hybrid Transformers for Music Source Separation

2 code implementations15 Nov 2022 Simon Rouard, Francisco Massa, Alexandre Défossez

While it performs poorly when trained only on MUSDB, we show that it outperforms Hybrid Demucs (trained on the same data) by 0. 45 dB of SDR when using 800 extra training songs.

 Ranked #1 on Music Source Separation on MUSDB18 (using extra training data)

Music Source Separation Speech Enhancement

High Fidelity Neural Audio Compression

4 code implementations24 Oct 2022 Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.

Audio Compression Decoder +1

AudioGen: Textually Guided Audio Generation

1 code implementation30 Sep 2022 Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.

Audio Generation Descriptive

Decoding speech perception from non-invasive brain recordings

1 code implementation25 Aug 2022 Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, Jean-Rémi King

Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery.

Contrastive Learning EEG

Hybrid Spectrogram and Waveform Source Separation

1 code implementation5 Nov 2021 Alexandre Défossez

Source separation models either work on the spectrogram or waveform domain.

Music Source Separation

Music Demixing Challenge 2021

1 code implementation31 Aug 2021 Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk

The main differences compared with the past challenges are 1) the competition is designed to more easily allow machine learning practitioners from other disciplines to participate, 2) evaluation is done on a hidden test set created by music professionals dedicated exclusively to the challenge to assure the transparency of the challenge, i. e., the test set is not accessible from anyone except the challenge organizers, and 3) the dataset provides a wider range of music genres and involved a greater number of mixing engineers.

Music Source Separation

A Simple Convergence Proof of Adam and Adagrad

no code implementations5 Mar 2020 Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients.

Music Source Separation in the Waveform Domain

1 code implementation27 Nov 2019 Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song.

Audio Generation Audio Synthesis +4

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

1 code implementation3 Sep 2019 Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments.

Music Source Separation

SING: Symbol-to-Instrument Neural Generator

1 code implementation NeurIPS 2018 Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

Audio Synthesis Decoder +1

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

no code implementations6 Nov 2017 Alexandre Défossez, Francis Bach

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems.

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

no code implementations29 Nov 2014 Alexandre Défossez, Francis Bach

Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size $\gamma$, and a bias term that decays as O(1/$\gamma$ 2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate.

Cannot find the paper you are looking for? You can Submit a new open access paper.